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basis.  It  is  felt  that  the  best  approach  for  future  subjective  testing  will 
be  a parametric  approach  using  representative  male  and  female  talkers  to 
cover  the  expected  range  of  pitch..  An  automated  and  refined  version  of 
Volers  Diagnostic  Acceptablllty_Jlaaaure  (DAM)  test  is  an  attractive  option. 

-.Objective  testing  is  considered  as  a possible  alternative  to  subjective 

testing.  -Reported ' here -ie^ at  tiro  part  experimental  study  of  the  relationship 
between  a number  of  objective  subjective  acceptability 

measures  available  from  the  FARM  atu?ly||— Tne  rJJ?  part  of  the  study, 
controlled  distortions  were  applied  to  sj^ech' samples  in  order  to  measure 
the  resolving  power  of  the  candldate^-^rbTective  measures  on  these  types  of 
distortions.  In  the  second  part,"  the  candidate  objective  measures  were 
applied  to  speech  samples, from  the  same  systems  on  which  FARM  tests  were 
run,,  and  the- statistical  correlation  between  the  objective  and  subjective 
measures  were  studied.  Objective  measures  examined  Include  spectral  distance 
measures:  suveral  LFC  based  spectral  distances,  LPC  error  power  ratio,  and 
cepatral  distance;  as  well  as  pitch  comparison  measures,  and  noise  power 
measures.  Controlled  distortions  were  formant  bandwidth,  freqiHincy,  pitch, 
low-pass  bandwidth,  and  additive  noise.  Correlations  with  subjective  test 
data  range  from  *0.2  to'  0.8. 

In  the  communicability  test,,  a somewhat  different  point  of  view  is  taken. 

, >^^e  user  la  expected  to  perform  on  the  data  some  cognitive  task  which  is 
measurable.  The  rationale  4>er«^la  that  the  user  will  be  better  able  to 
perform  if  the  quality  is  high,  than  if  his  cognitive  resource,  assumed 
fixed,  is  saturated  due  to  poorer  quality  transmission. ''^The  test  format 
chosen  for  this  study  was  a multiple  digit  recall  test  Similar  to  that 
studied  at  Bell  Labs  by  Naghtanl.  In  this  format,  sequences  of  random  digits 
arc  first  recorded  by  trained  speakers,  and  then  these  utterances  are 
played  through  various  distorting  systemp.  The  resulting  sequences  are 
then  played  to  subjects  whose  task  is  to  "recall"  the  digits  after  a short 
( ' 1 second)  wait.  These  tests  prove  to  be  rather  unpleasant  to  take,  and 
require  larger  numbers  of  subjects,  but  will  differentiate  among  distorting 
systems. 


1 UCEttHM  Ut  ..  \ 

urn 

DOO 

IWAIINOQNC 

JUSTIFICATI 

mitt  stetiN 
tin  iKtiw  □ 

■D  □ 



: 

OtSniRUT 

■"  Dili, 

lON/IIVIllUlll 

tVAIL.  mil/ 

ITY  GCOQ 

UNCLASSIFIED 


SCCUWITY  classification  of  this  FAOEfUTimn  Dmtm  Bnfrmd) 


PREFACE 


This  effort  was  conducted  by  the  School  of  Electrical  Engineering 
under  the  sponsorship  of  the  Rome  Air  Development  Center  Post-Doctoral 
Program  for  the  Defense  Coninunications  Agency.  Dr.  w.  R.  Belfield  of 
the  Defense  Communications  Engineering  Center  was  the  task  project 
engineer  and  provided  overall  technical  direction  and  guidance. 

The  RADC  Post-Doctoral  Program  is  a cooperative  venture  between 
RADC  and  some  sixty-five  universities  eligible  to  participate  in  the 
program.  Syracuse  University  (Department  of  Electrical  and  Computer 
Engineering) , Purdue  University  (School  of  Electrical  Engineering) , 

Georgia  Institute  of  Technology  (School  of  Electrical  Engineering) , and 
State  University  of  New  York  at  Buffalo  (Department  of  Electrical  ^ / 

Engin  jering)  act  as  prime  contractor  schools  with  other  schools  / (. 
participating  via  sub-contracts  with  the  prime  schools.  The  U.S.  / ' 

V ! 

Air  Force  Academy  (Department  of  Electrical  Engineering),  Air  Force 
Institute  of  Technology  (Department  of  Electrical  Engineering) , and 
the  Naval  Post  Graduate  School  (Department  of  Electrical  Engineering) 
also  participate  in  the  program. 

Hie  Post-Doctoral  Program  provides  an  opportunity  for  faculty 
at  participating  universities  to  spend  up  to  one  year  full  time  on 
exploratory  development  and  problem-solving  efforts  with  the  post- 
doctorals  splitting  their  time  between  the  customer  location  and  their 
educational  institutions.  The  program  is  totally  customer-funded 
with  current  projects  being  undertaken  for  Rome  Air  Development 
Center  (RADC),  Space  and  Missile  Systems  Organization  (SAMSO) , 


Aeronautical  Systems  Division  (ASD) , Electronic  Systems  Division 
(ESD) , Air  Force  Avionics  Laboratory  (AFAL) , Foreign  Technology 
Division  (FTD) , Air  Force  Weapons  Laboratory  (AFWL) , Armament 
Development  and  Test  Center  (ADTC) , Air  Force  Comnunicatlons  Service 
(AFCS) , Aerospace  Defense  Command  (ADC),  Hq  USAF,  Defense  Communications 
Agency  (DCA) , Mavy,  Army,  Aerospace  Medical  Division  (AMD),  and 
Federal  Aviation  Administration  (FAA) . 

Further  information  about  the  RADC  Post-Doctoral  Program  can 
be  obtained  from  Jacob  Scherer,  RADC,  tel.  AV  587-2543,  COMM  (315)  - 
330-2543. 
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CHAPTER  1 


INTRODUCTION 


] . 1 Task  History 

The  engineering  effort  reported  on  here  was  performed  at  Georgia 
Institute  of  Technology  in  the  School  of  Electrical  Engineering  for 
the  Defense  Communications  Agency  through  the  Rome  Air  Development 
Center  Post-Doctoral  Prograun.  The  Post-Doctoral  Progreun  is  under  the 
direction  of  Mr.  Jake  Scherer.  The  monitoring  officer  at  the  Defense 
Communications  Engineering  Center  was  Dr.  William  R.  Bel field,  at  the 
Defense  Conmunications  Engineering  Center  (DCEC) . 

This  task,  ^.n  investigation  of  subjective  speech  quality  testing, 
objective  speech  quality  testing,  and  connunicability  testing,  was 
undertaken  following  the  development  at  DCEC  of  a large  data  base 
associated  with  PARM  and  QUART  (Paired  Acceptability  Rating  Method  and 
Quality  Acceptance  Fating  Test) . The  existence  of  this  data  base  has 
made  possible  the  detailed  analysis  of  subjective  testing  procedures, 
objective  testing  methods,  and  communicability  testing,  with  good 
cross  checking  and  validity  referencing  of  results. 

1.2  Speech  Digitization  Systems  and  Testing  Requirements 

Since  it  has  for  some  years  been  clear  that  some  form  of  end- 
to-end  speech  digitization  would  be  initiated  in  the  Defense  Communica- 
tion Systems,  a number  of  speech  digitization  systems  have  been  developed 
in  various  laboratories  around  the  country.  The  job  of  selecting  from 
these  candidate  systems  the  features  to  be  included  in  a final  system 
requires  extensive  evaluation  euid  testing  to  be  conducted.  When  a 
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"final"  system  is  fielded,  periodic  field  testing  of  all  links  for 
continued  operational  quality  will  be  a significant  requirement.  This 
study  atten^ts  to  further  focus  efficient  means  for  developmental  and 
operational  quality  testing. 

1.3  Personnel,  Procedures,  and  Facilities 

This  task  has  been  carried  out  principally  by  Dr.  T.  P. 

Barnwell,  with  Dr.  A.  M.  Bush,  and  with  the  active  involvement 
of  Dr.  R.  W.  Schafer  and  Dr.  R.  M.  Mersereau.  Student  Assistants  have 
included  Mr.  Ashfaq  Arastu,  Mr.  Bartow  Willingham,  and  Mr.  J.  D.  Marr 
here  at  Georgia  Tech.  This  group  also  consulted  on  two  occasions  with 
Dr.  W.  D.  Voiers  of  Dynastat,  Inc.,  Austin,  TX.  The  project  was  done 
for  and  with  the  active  help  of  Dr.  William  R.  Belfield  of  the  Defense 
Communications  Engineering  Center. 

Team  leader  was  Dr.  T.  P.  Barnwell.  The  project  was  initiated 
in  May  1976  and  completed  in  May  1977.  Although  six  months  effort  was 
originally  estimated,  unavoidable  delays  in  establishing  the  PARM  data 
base  at  Georgia  Tech  delayed  its  progress.  This  report  was  prepared 
at  Georgia  Tech,  tentatively  approved  in  rough  draft  form  at  DCED,  and 
subsequently  reproduced  at  Georgia  Tech. 

This  work  was  carried  out  in  the  School  of  Electrical  Engineering 
Digital  Signal  Processing  Facility.  A block  diagram  is  given  as 
Figure  1.1.  A more  detailed  description  of  the  facility  is  given  in 
Appendix  C. 

1.4  Technical  Organization 

The  work  reported  here  had  as  its  ultimate  goal  the  development 
of  efficient  objective  methods  and  tests  for  predicting  user  acceptance 
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of  digital  speech  transmission  systems.  Three  phases  of  the  attack  on 
this  goal  were  established:  (a)  summary  investigation  of  subjective 
testing  nethods;  (b)  development  of  a communicability  test  procedure; 

(c)  development  of  objective  testing  procedures. 

nie  outputs  of  the  study  are  reconnendations  for  future 
subjective  test  organization  and  xD^lementation,  specification  of  an 
objective  testing  procedure  with  cross-validation  against  FARM  sub- 
jective testing  results,  specification  of  a communicability  test 
philosophy  and  in^lementation  of  the  test  with  results  analyzed 
statistically.  A secondary  output  Is  the  FARM  data  base  now  organized 
for  efficient  searches. 

Work  progressed  in  all  three  phases  in  parallel,  with  some  un- 
eigtected  delays  due  to  the  time  required  to  obtain  and  organize  the 
data  base  from  FARM  (this  is  a large  data  base).  A.  M.  Bush  took 
principal  responsibility  for  the  subective  testing  portion,  ^md  T. 

P.  Barnwell  was  principally  responsible  for  the  objective  test  and  the 
conmunicability . R.  W.  Schafer  and  R.  M.  Mersereau  also  contributed 
to  all  three  phases  of  the  effort. 

1.5  Organization  of  the  Report 

The  detailed  aspects  of  each  of  the  three  phases  of  the  effort 
are  presented  in  the  report  with  the  objective  testing  study  in  Chapter 
2,  the  subjective  testing  study  in  Chapter  3,  and  the  conmunicability 
test  in  Chapter  4.  Each  chapter  is  headed  by  an  introduction  giving 
the  philosophy  and  rationale  for  that  phase  of  the  work  and  the 
technical  perspective  required  for  that  phase. 
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H.  OBJECTIVE  MEASURES  FOR  SPEECH  QUALITY 


2.1  Introduction 

In  recent  years,  considerable  effort  has  neen  devoted  to  the 
development  and  implementation  of  efficient  algorithms  for  digitally 
encoding  speech  signals.  These  algorithms,  which  are  utilised 
chiefly  in  digital  comniuni cations  systems  and  digital  storage  systems, 
cover  a wide  range  of  techniques,  and  result  in  systems  which  vary 
greatly  in  cost,  complexity,  data  rate,  and  quality.  Generally 
speaking,  modern  speech  digitisation  systems  can  be  divided  into  four 
categories)  high  rate  systems  which  operate  from  ~ 100  KBPs  tn 
~ 32  KBPst  intermediate  rate  systems  which  operate  from  ~ 32  KOPs  to 
~ 8 KBPs;  low  rate  systenu  which  operate  from  ~ 8 KBPs  to  " 1 KBPs;  and 
very  low  rate  systems  which  operate  below  1 KBPs.  In  the  high  rate 
systems,  PCM  (2.1]  and  adaptive  PCM  [2.2]  are  of  the  predominant  tech- 
niques. In  the  intermediate  rate  systems,  the  techniques  are  more  varied, 
including  DM  (2. 3],  I*DM  [2.41(2.51,  ''PCM  [2.6],  ADPCM  '2.7],  APC  [2.8], 
and  adaptive  transfoirm  coding  [2.9].  The  low  rate  systems  consist  mostly 
of  the  vocoder  techniques,  including  1,PC  (2..  13],  channel  vocoders 

[2. 14] [2. 15) , phase  vocoders  [2. 20] [2. 21] , and  several  other  techniques 
(2.22).  very  low  rate  syst'.'ms  usually  involve  feature  extraction  on  a 
perceptual  qr  linguistic  ]evel,  and,  thus  far,  vary  few  systems  of  this 
type  have  bean  inplumented.  As  a general  rule,  the  higher  data  rate 
systems  are  less  expensive  to  implement  and  less  sensitive  to  bit 
errors,  while  the  lower  rate  systems  require  more  expensive  terminals, 
and  result  in  greater  distortions  in  the  presence  of  arrore. 
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The  problem  of  rating  «nd  comparing  these  systems  from  the 
standpoint  of  user  acceptance  is  a difficult  one,  particularly  since 
the  candidate  systems  are  usually  highly  intelligible.  Hence,  intelli- 
gibility tests,  such  as  the  DRT  (2.23),  may  not  suffice  to  resolve  small 
differences  in  acceptability.  Direct  user  preference  tests  such  as 
the  PARM  [2.24]  have  been  found  useful  for  this  purpose  but  are  not  highly 
cost  effective.  Horeover,  they  provide  no  diagnostic  information  which 
could  be  of  value  in  remedying  the  deficiencies  of  systems  being  tested. 

Objective  measures  which  can  be  computed  from  saa^ple  speech 

I 

materials  offer  a possible  alternative  to  subjective  acceptability 
measures.  It  should  be  noted,  however,  that  the  perception  of  speech 
is  a highly  complex  process  involving  not  only  the  entire  grammar  and 
the  resulting  syntactic  structure  of  the  language,  but  also  such 
diverse  factors  as  semantic  context,  the  speaker's  attitude  and  emotional 
state,  and  the  characteristics  of  the  human  auditory  system.  Hence,  the 
development  of  a generally  appliceUole  algorithm  for  the  prediction  of 
user  reactions  to  any  speech  distortion  must  await  the  results  of 
future  research.  However,  the  effects  of  certain  classes  of  distortion 
are  potentially  predictable  on  the  basis  of  present  knowledge.  In 
particular,  eubstantial  progress  has  been  made  in  quantifying  the 
importance  of  such  acoustic  features  as  pitch,  intensity,  spectral 
fidelity,  and  speech/noise  ratio  lu  the  intelligibility,  speaker 
recognisad>illty  as  well  as  the  overall  acceptability  of  the  received 
speech  signal.  Thus  far,  little  success  has  accompanied  efforts  to 
predict  the  subjective  consequences  of  other  than  relatively  simple 
forms  of  signal  degradation,  bu^  recent  developments  in  digital  signal 
processing  techniques  (2. 25]  [2. 26] , suggest  a number  of  efficient  objective 
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measures  which  could  be  highly  correlated  with  user  acceptability. 

In  a recent  study  cor.iuc.ed  by  the  Defense  Department  Consortium 
on  speech  quality,  a large  number  of  speech  digitization  systems  were 
subjectively  tested  using  the  Paired  Acceptability  Rating  Method  (PARM) 
Test  [2.24]  developed  at  the  Dynastat  Corporation.  The  systems  tested 
included  a representative  cross-section  of  the  i.itermediate  rate  and 
low  rate  systems  which  had  been  implemented  in  hardware  at  the  time  of 
the  study,  and,  consequently,  offered  a large  user  accepti/bility  data 
base  covering  most  classes  of  distortion  present  in  modern  speech 
digitization  f :^ms.  The  existence  of  the  PARM  data  base  offered 
a unique  o£ , u<.  ..  ' ) measure  the  ability  of  objective  measures  tc 

predict  tru*  h t ve  acceptability  scores.  Further,  it  allows  the 

development  of  precise  methodologies  for  the  utilizations  of  objective 
measures  in  conjunction  with  subjective  measures  to  possibly  reduce  the 
cost  of  speech  system  quality  testing. 

This  chapter  describes  a two  part  experimental  study  of  the 
relationship  between  a number  of  objective  quality  measures  and  the 
subjective  acceptability  measures  available  from  the  PARM  study.  In 
the  first  part  of  the  study,  controlled  distortions  were  applied  to 
speech  sanples  in  order  to  measure  the  resolving  power  of  the  candidate 
objective  measures  on  these  types  of  distortion.  In  the  second  part, 
the  candidate  objective  measures  were  applied  to  speech  samples  from  the 
same  systems  on  which  the  PARM  tests  were  run,  and  the  statistical 
correlation  between  the  measures,  objective  and  subjective,  were  studied. 

This  entire  chapter  consists  of  five  sections.  In  Section  2.2, 
the  choice  of  objective  measures  is  discussed.  In  Section  2.3,  the 
"controlled  distortion"  ejcperiment  is  presented.  In  Section  2.4,  the 


7 


obj«ctlve-iiubj©ctiv0  coriwlation  oxperimvnt  is  described.  Section  2.5 
svwiMriEes  the  results  of  this  effort,  and  suggests  directions  for 
future  research. 

2.3  The  Choice  of  Objective  Measures 
2.2.1  The  Speech  Perception  Process 

Hunan  speech  perception  is  a complex  process  in  which  distortions 
in  the  acoustic  signal  do  not  map  simply  onto  perceived  quality.  In 
this  section,  several  aspects  of  speech  perception  which  relate  to 
perceived  speech  quality  will  be  discussed,  and  some  general  conclusions 
will  be  drawn. 

First,  it  should  be  noted  that  the  syntactic  structure  of  a 
language  has  many  con^nents  which  impact  speech  perception.  A sentence 
in  a language  may  be  viewed  as  a concatenation  of  phonemes  which  are 
hierarchically  organised  into  syntactic  and  semantic  units  on  a multi- 
tude of  levels.  Phonemes  are  grouped  into  syllables,  syllables  into 
words,  and  words  into  higher  units  (compounds,  noun  phrases,  verb 
phrases,  clauses,  sentences,  etc.)  based  on  the  phrase  structure  of  the 
sentence  [2.27).  Numerous  modern  linguists  are  trying  to  develoji  a com- 
prehensive grasnatlcal  theory  for  the  generation  of  the  syntactical 
tree  structures  which  represent  the  underlying  sentence  organisation. 

The  point  here  is  that  a great  deal  more  information  than  the  identity 
of  the  phonesies  la  being  transmitted  by  the  speech  signal . Nord 
boundaries,  phrase  boundaries,  and  many  other  syntactic  elements  have 
explicit  correletes  in  the  acoustics.  It  is  these  structural  correlates 
which  allow  the  listener  to  understand  the  sentence  structure,  hence,  to 
use  hie  great  knowledge  of  the  language  to  help  him  perceive  the  words 
themselves.  Researches  in  speech  synthesis  by  (2.28)  (2.29)  have  found 
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that  the  need  to  correctly  produce  the  acoustic  correlates  of  the  syntax 
is  at  least  equally  Inportant  to  correctly  producing  tne  acoustic 
correlates  of  the  phoncuees. 

There  is  yet  another  level  of  information  transmitted  in  tlie 
speech  signal  above  the  syntactic  level.  This  levei  is  semantic  in 
nature,  and  incorporates  the  speaker's  attitudes  about  the  subject 
matter  of  the  utterance.  Linguistically,  this  information  lies  in  the 
"intonation"  and  "emphasis"  of  the  sentence,  and  this  is  also  explicitly 
encoded  in  the  acoustics. 

When  perceiving  a sentence,  a listener  uses  all  these  cues, 
phonemic,  syntactic,  and  semantic,  to  help  him  understand  the  utterance. 
All  these  levels  are  highly  redundant,  and,  in  some  cases,  a great  deal 
of  acoustic  distortion  can  occur  without  effecting  the  intelligibility 
or  even  the  quality  of  the  speech.  However,  in  other  cases,  very 
slight  distortions,  such  as  those  which  effect  the  perception  of  syntac- 
tic structure,  can  cause  complete  loss  of  intelligibility.  Uhat  is 
important  in  understanding  the  effect  of  a particular  distortion  is  in 
understanding  the  way  in  which  it  interacts  with  the  entire  complex 
speech  understanding  process.  At  this  point  in  tine,  even  a simple 
complete  enumeration  of  the  information  in  a sentence  is  beyond  the 
scope  of  current  theory.  This  is  why  the  problem  of  developing  general 
objective  quality  measures  is  so  difficult. 

This  is  not  to  say,  however,  that  there  is  not  considerable 
knowledge  about  the  acoustic  correlates  of  the  features  of  speech.  It 
is  well  established  that  the  phonemic  information  ;^s  primarily  found  in 
the  acoustic  filtering  effect  of  the  upper  vocal  tract,  and  hence,  in 
the  short  time  spectral  envelope  of  the  speech.  Likewise,  it  is  well 
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known  that  phase  information,  other  than  pitch,  is  not  perceivable  [2.22) 
Also,  it  has  been  well  demonstrated  that  a great  deal  of  information 
about  consonantal  identities  are  found  in  the  formant  behavior  of  the 
adjacent  vocalics.  3ut  there  ate  other  phonemic  acoustic  correlates  in 
English  besides  the  spectral  envelope.  For  example,  voicing  information 
in  consonants  is  found  in  the  durations  of  adjacent  vowels  and  in  the 
local  pitch  contour  12.301. 

The  major  acoustic  correlates  cf  syntactic  structure,  intonation, 
and  emphasis  are  pitch,  vowel  durations,  and  intensity.  Of  these  cor- 
relates, pitch  is  by  far  ttie  strongest  12.311  12.321,  followed  by  duration, 
and  then  intensity.  There  in  also  evidence  that  there  are  son,e  effects 
in  the  spectral  envelope  which  are  involved  in  the  perception  of  these 
" super segmcntals, '•  though  these  are  small. 

When  developing  objective  quality  measures  for  intermediate  rate 
and  low  rate  digitization  systems  an  important  point  is  that,  due  to  the 
nature  of  the  systems  themselves,  < nly  certain  classes  of  distortions  can 
occur.  For  example,  phoneme  durations,  which  are  very  important  in 
perception  of  both  phonemic  and  structural  information,  are  not  altered 
by  coding.  In  vocoder  systems,  where  the  spectral  envelope,  pitch  and 
excitation,  and  gain  .information  are  separated  naturally  as  part  of  the 
digitization  process,  the  mapping  of  the  various  parameters  onto  tne 
perceptual  dranain  is  relatively  easy  to  characterize.  To  detect 
distortion  related  to  phonerivic  perception,  spectral  distance  measures 
seem  most  important.  Since  the  pitch  contour  plays  such  an  important 
tole  in  perception,  some  sort  of  excitation  comparison  should  also  be 
used.  Since  gain  is  relatively  less  important,  it  is  expected  that 
only  gross  gain  errors  should  be  detected. 
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In  the  case  of  waveform  coders,  the  distortions  are  not  so  easily 
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related  to  perception.  Pitch  information  is  not  likely  to  be  effected, 
but  simple  signal/noise  ratios  are  not  obviously  good  candidates  for 
quality  measures.  A more  likely  candidate  might  be  a measure  based  on 
the  noise  spectrum  at  the  receiver. 

2.2.2  Specific  Objective  Quality  Measures 

In  this  section,  all  of  the  objective  quality  measures  tested 
in  this  study  will  be  presented.  All  of  the  measures  studied  were  not 
necessarily  metrics.  In  order  to  qualify  as  a true  metric,  a distortion 
measure,  D(X,Y),  between  two  signals,  X and  Y,  must  meet  the  following 
conditions : 


1. 

D(X,Y) 

a 

0 iff  x-y 

D(X,Y) 

0 if  X»<Y 

2. 

D(X.Y) 

m 

D(Y,X) 

3. 

D(X,V) 

D(X,Z)  + D(2,Y). 

Some  of  the  distortion  measures  in  this  study  meet  these  requirements, 
while  others  do  not. 

2. 2. 2.1  Spectral  Distance  Measures 

Spectral  distance,  in  this  context,  refers  to  a distance  measure 
between  a sampled  envelope  of  the  source  or  unprocessed  speech  signal 
and  a degraded  form  of  the  signal.  Since  there  are  many  methods  for 
approximating  the  "short  time  spectrum"  of  a signal,  there  are  corres- 
pondingly many  metrics  which  may  be  formed  from  a speech  signal.  A 
good  measure  should  have  two  characteristics:  it  should  consistentlv 
reflect  perceptually  significant  distortions  of  different  types;  and, 
it  should  be  highly  correlated  with  subjective  quality  results. 

A total  of  sixteen  spectral  distance  measures  and  related 
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measures  were  studied  in  this  project.  Let  V(6),  -ttsOStt,  be  the  short 
time  power  spectral  envelope  for  a frame  of  the  original  sentence  and 
let  V (6)  be  the  power  spectral  envelope  for  the  corresponding  frame  of 
distorted  sentence.  In  this  discussion,  It  Is  assumed  that  the  proper 
time  aynchronlration  has  occured,  and  that  V(6)  and  V* (0)  are  for  the 
same  frame  of  speech.  Due  to  the  fact  the  gain  variations  are  not  of 
interest  here,  the  spectrums  V(0)  and  V (0)  may  be  normalized  to  have 
the  same  arithmetic  mean  either  in  a linear  or  a log  form.  A geometric 
distance  between  the  spectrums  of  the  distorted  and  original  spectrums 
may  be  taken  In  several  ways,  including  direct  spectral  distance 

D(0)  - V(e)  - V (6)  , 2.1 

the  difference  in  the  log  spectrums 

0(6)  - 10  logj^QV(0)  - 10  log^QVMe)  » 2.2 

the  source  normalized  distance  measure, 

D(0)  - (V(0)  - V (0)]/V(0)  2.3 

and  the  ratio  of  power  spectrums 

D(0)  - V(0)/V’  (0)  . 2.4 

Of  th^se  measures,  2.1  and  2.2  can  form  the  basis  for  true  metrics, 

while  2.V&nd  2.4  cannot.  A large  class  of  distance  measures  can  be 

defined  as  ^e  weighted  L norm  "d  " by 

P P 
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dp(V,V,W) 


1/p 


w(v,v  ,8)  10(0)  l^de 


w(v.v' ,0)de 


2.5 


where  W(V,V',6)  la  a weighting  function  which  allows  functional  weight- 
ing based  on  either  of  the  power  spectral  envelopes  or  on  frequency.  In 
this  study,  W(V,V',6)  * 1,  and  2.5  reduces  to 


dp(V,V')  - J lD(e)|PdGl 


1/P 


2.6 


Clearly,  the  higher  the  value  of  "p,"  the  greater  the  eoiphasia  on  large 
spectral  distances.  This  leeasure  ntay  bo  digitally  approximated  by 
sampling  0(6),  giving 
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dp(V,V') 


<M  I 

bifI 
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2 . 2 . 2 . 1 . \ The  LPC  Spectral  Oistence  Measures 

Since  the  output  speech  waveform  is  a convolution  between  a 
spectral  envelope  "filter"  and  excitation  signal,  then  a deconvolution 
is  necessary  for  spectral  envelope  coii\par Isons.  The  LPC  analysis  is 
Itself  a parametric  spectral  sstimation  process,  and  may  be  used  to 
extract  an  approximation  of  the  s.  tral  envelope.  The  bloc)c  diagram 
for  an  LPC  spectral  analysis  system  is  given  in  Figure  2.1.  If  the 
LPC  parameters  are  (a^,  . . . then  the  spectrum  function  V(G), 

is  given  by 


V(e) 


) 


2 


~it<esTi 


2.8 


13 


LEVINSON 

INVERSION 


where 


A (z)  * 1 


N 

I 


i-1 


a^z 


-i 


2.9 


This  approximation  can  be  used  to  calculate  any  of  the  measures  suggested 
above. 

There  are  a number  of  additional  measures  which  can  be  calculated 

from  A(z).  These  are  not  true  spectral  distance  metrics  or  measures, 

but  are  related,  and  have  the  additional  feature  that  they  are  easy  to 

calculate.  Several  of  these  measures  are  simply  geometric  distances  in 

the  parameter  domains,  such  as  feedback  coefficients,  PARCOR  coefficients, 

area  functions,  and  pole  locations.  In  each  of  these  cases,  we  can 

define  d as 
P 

• (i  I U„- 2.10 

Ok*! 

where  C is  the  m^^  parameter  (PARCOR  coefficient,  area  function,  etc.), 
m 

and  N is  the  number  of  parameters  involved  in  the  representation. 

Another  related  approach  is  illustrated  in  Figure  2.2.  The 
original  speech  signal  is  analyzed  using  an  LPC  analysis,  and  the 
inverse  filtered  waveform  is  formed  by 

N 

■ ®1  ■ ^ 

^ ^ J-1  J ^ ' 

where  a^  is  the  j LPC  coefficient  and  s^  is  the  i speech  sample. 

This  optimal  filter  is  then  used  to  inverse  filter  the  distorted 
waveform,  resulting  in 


15 


figure  22  SYSTEM  FOR  COMPUTING 


2.12 


N 


j=l 


The  measure  which  is  used  is  then 


d = 
P 


2.13 


where  L is  the  total  number  of  san^les  in  the  utterance. 

2. 2. 2. 1.2  Cepstral  Spectral  Disteuice  Measures 

Another  technique  used  often  for  deconvolving  the  spectred 
envelope  from  the  excitation  is  cepstral  2malysi8  [2.33]  [2.34] . The 
emalytiis  system  for  cepstral  analysis  is  shown  in  Figure  2.3.  By 
Parseval's  Theorem,  dj  can  be  calculated  from  the  cepstirum  by 

«0 

^2  = IK-  2.14 

k=o 


where  and  are  the  cepstral  components  for  the  original  and  the 
test  signal  respectively.  For  the  same  reason  that  cepstral  deconvolu- 
tion works  well  on  speech,,  only  a few  coefficients  need  to  be  used 
(<  40)  to  calculate  d2.  Since  the  cepstral  measure  is  computationally 
intensive  (2  FFT's  per  frame)  and  since  it  has  been  shown  that 
calculated  ftom  A(z)  is  very  highly  o^rrelated  with  calculated  from 
the  cepstrum  [2.35],  then  it  does  not  appear  that  the  cepstral  measure  is 

very  attractive.  However,  the  cepstral  measure  is  attractive  for 
excitation  feature  extraction  (see  2. 2. 2. 2. 2);  since  the  low  order 
cepstral  coefficients  are  a by-product  of  that  analysis,  and  since  CCD's 
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offer  potential  for  cheap  FFT'a  ueing  the  CHlRP-Z  Transform,  then 
cepstral  measures  are  worthy  of  consideration. 

2. 2.2. 2 Excitation  Feature  Extraction 

Pitch  is  a very  important  acoustic  correlate  of  many  supersegmen- 
tal  features,  and  distortions  in  the  pitch  contour  are  easily  perceivable 
nd  very  detrimental  to  quality.  Pitch  estimation  errors  and  voiced/ 
unvoiced  errors  may  occur  in  any  pitch  excited  vocoder  system.  Hence, 
it  is  of  interest  to  investigate  objective  measures  for  comparing 
excitation  features  for  those  systems  where  it  is  applicable. 

The  ideal  solution  to  this  problem  would  be  to  generate  high 
quality  pitch  contours  for  the  original  utterances,  and  to  compare 
the  e to  the  values  used  by  the  vocoder  synthesis  algorithm.  However, 
since  the  excitation  parameters  are  not  explicitly  available  in  vocoder 
systems,  and  since  the  excitation  data  is  not  available  for  the  systems 
used  in  the  FARM  teat,  then  this  approach  is  unreasonable. 

A second  possibility  is  to  apply  a high  quality  pitch  detector 
to  both  the  original  and  the  distorted  speech,  and  to  compare  these 
results.  A system  which  con^ares  pitch  excitation  contours  was  developed 
at  Georgia  Tech  under  a previous  effort  12.36)  along  with  several  high 
quality  pitch  detection  proqrams.  The  statistics  performed  by  the 
pitch  comparison  program  (PCHECK)  are  enumerated  in  Table  2.1.  This 
approach  was  studied  experimentally  using  the  Hard  Limited  Autocorrela- 
tion Pitch  Detector  [2.36)  and  the  Multiband  Pitch  Detector  [2.36]. 

\ third  possible  approach  involves  developing  a measure  for 
excitation  differences  which  does  not  depend  on  any  pitch  detection 
algorithm.  The  Idea  is  to  use  a deconvolution  technique  which  is  aimed 
at  retrieving  the  excitation  representation  rather  than  the  spectral 
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STATISTICS 

1.  Total  number  of  pitch  errors 

2.  The  average  errors  per  s2Ui\ple  in  voiced  regions 

3.  The  number  of  gross  errors  (greater  than  a threshold) 

4.  The  average  gross  errors 

5.  The  number  of  subtle  errors  (less  than  a threshold) 

6.  The  average  siibtle  errors 

7.  The  number  of  voicing  errors 

8.  Sample  standard  deviations  from  the  above  averages 


2.1  Statistics  Calculated  by  “PCHECK"  Pitch  Comparison  Program 
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envelope  representation.  The  cepatrums  of  the  two  speech  signals  have 
many  features  which  suggest  that  they  might  be  good  candidates  for  an 
excitation  distance  measure.  First,  they  have  a region  in  which  the 
signal  characteristics  are  almost  entirely  representative  of  the  excita- 
tion function.  Second,  since  this  region  is  easily  identifial)lo,  no 
pitch  decision  or  voiced/unvoiced  decision  is  necessary.  Third,  the 
shape  of  the  cepstrum  in  the  excitation  region  contains  some  additional 
information  about  the  excitation  besides  just  pitch.  l,ast,  the  compu- 
tation of  the  cepstrum  leads  to  a spectral  envelope  representation  which 
might  also  be  used  as  part  of  a spectral  distance  measure. 

The  way  in  which  an  excitation  distance  measure  might  be  calcu- 
lated is  illustrated  in  Figure  2.4.  After  the  cepstrxun  of  the  two 
signals  Is  calculated,  a smoothing  filter  is  used  to  make  the  measure 
less  severe.  Next,  a distance  metric  is  calculated  by 


d 

P 


I W(C,C',k)  (C-C’)^ 

k-Nl ^ 

N2 

I W(C,C',k) 
k-Nl 


1/P 


2.15 


In  this  measure,  and  are  the  cepstral  coefficients  for  the  original 
and  distorted  speech  respectively,  and  W(C,C',k)  is  a weighting  function. 
In  this  study,  the  weighting  functions  which  were  studied  were  W(C,C^,k) 
“I  (no  weight)  and  W(C,C' ,k)-Cj^,  which  weights  samples  near  pitch  peaks 
more  than  those  in  unvoiced  regions. 

2. 2. 2. 3 Noise  Power  Measures 


Traditionally,  signal-to-noise  ratio  has  been  one  of  the  pre- 
dominant measures  for  determining  the  performance  of  waveform  coding 
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systems.  This  measure  is  attractive  since  it  is  so  easily  calculated 
and  since  values  for  this  measure  are  known  for  most  appropriate 
systems.  It  is  unattractive  since  it  is  difficult  to  evaluate  in 
light  of  what  is  known  about  speech  perception. 

A far  more  interesting  approach  might  be  to  develop  a measure 
based  on  the  coloration  of  the  noise  as  well  as  its  power.  In  short, 
if  noise  is  defined  as 

”i  “ ®i  “ 

where  and  s|  are  seunples  of  the  original  and  distort''nI  speech 
respectively,  then  the  noise  spectral  envelope  M(6)  could  be  found 
using  LPC  or  cepstral  techniques  as  before.  A measure  could  be 
defined  such  that 


.+1T 

J_^W(0)N^(e)d6 

"p  * /■+" 

_^W(0)d0 


and 


d 

P 


1/P 


2.17 


2.13 


This  would  be  attractive  since  it  tfould  allow  some  measure  of  the 
spectral  characteristics  of  the  noise,  wha.ch  is  very  likely  to  have 
perceptual  iuq^ct.  If  W(0)«1,  then,  by  Parseval's  Theorem,  this  measure 
becomes  the  8...gnal-to>noise  ratio  for  p«2. 

Though  this  represanted  a very  interesting  area  for  study,  very 
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little  was  done  on  noise  measurements  in  this  study.  This  is  because 
the  data  base  associated  with  the  PARM  was  not  in  a form  to  make  the 
necessary  computations  reasonable. 

2.3  Initial  Qualitative  studies  and  Controlled  Distortions 

This  section  describes  two  phases  of  the  experimental  study.  In 
the  first  phase,  example  sentences  from  various  systems  were  digitized 
from  analog  magnetic  tape,  and  various  forms  of  gain  measures  and 
spectral  measures  were  applied  and  studied.  In  the  second  phase.  Che 
measxures  presented  in  the  previous  section  (2.2)  were  applied  to 
sentences  which  contained  controlled  distortions  to  test  these  measures 
for  consistency  in  measuring  those  distortions,  to  check  the  measure- 
ment of  combined  distortions,  and,  by  using  the  histograms  of  time  be- 
havior of  the  various  measures,  to  determine  a potential  resolving 
power  for  each  measure. 

2.3.1  Qualitative  Studies 

In  the  initial  study,  a total  of  20  sentences  from  two  speakers 
and  fi’  ' . , stems  were  digitized  from  analog  tape  (digital  tape  repre- 
sentatl  re  not  available  at  that  time),  and  stored  on  disk.  (See 

Table  It  A suJsgroup  of  those  sentences  was  then  analyzed  for  energy 

contours  and  for  spectral  representations  and  cepstral  spectral  analysis. 

The  energy  was  measured  bi  applying  Kaiser  windows  (2.37)  of 
various  lengths  as  FIR  filters  to  the  squared  waveforms.  The  window 
lengths  were  adjusted  such  that  pitch  periods  were  not  obvious  in  the 
energy  representations.  These  energy  plots  we’  ? then  used  to  try  to 
synchronize  the  sentences  with  one  another. 

Several  results  came  out  of  this  study.  First,  not  unexpectedly, 
the  energy  plots  for  the  waveform  coders  (CVSD  16  and  CVSD  9.6)  were 
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TEST  UTTERANCES 


HI  ANCHOR 

LLl* 

LL2 

CHI 

CH2 

CVSD  (16  KBPS) 

LLl* 

LL2 

CHI 

CH2 

CVSD  (9.6  KBPS) 

LLl* 

LL2 

CHI 

CH2 

LONGBRAKE  (2.4  KBPS) 

LLl* 

LL2 

CHI 

CH2 

HY2  (2.4  KBPS) 

LLl* 

LL2 

CHI 

CH2 

Part  of  Subtest  Group 


Table  2.2  Input  Sentences  Used  in  the 
Initial  Qualitative  Studies 
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very  similar  to  that  of  the  high  anchor  (original).  Second,  the  energy 
plots  for  the  vocoders  (Longbrake  2.4  and  Hy2  2.4)  were  very  different 
from  the  high  anchor  and  very  different  from  each  ot'ier.  AttcnjJts  to 
synchronize  the  utterances  using  the  gain  waveforms  result  in  different 
synchronizations  than  if  th*.  waveforms  are  synchronized  visually.  The 
point  here  is  that  since  the  local  intensity  of  a speech  waveform  is 
not  a highly  perceivable  quantity,  and  vocoders  take  advantage  of  this 
by  doing  relatively  poor  gain  estimation,  and  points  out  that  energy  is 
probably  not  a good  candidate  for  an  objective  quality  measure. 

Another  point  should  be  made  here.  The  synchronization  efforts 
here  point  up  clearly  that  the  use  of  analog  magnetic  tape  for  record! nc 
utterances  is  generally  unaccept^d3le.  Effects  which  (we  presume)  are 
due  to  the  stretching  of  the  analog  tapes  prevented  synchronization  from 
being  maintained  for  more  than  1-2  seconds.  Carefully  synchronized 
digital  playbac)^  and  recording  systems  must  be  used  as  a basis  for 
reasonable  objective  measures. 

In  the  second  part  of  this  study,  10  pole  LPC  spectral  analysis 
and  40  coefficient  cepstral  spectral  analysis  was  performed  on  the  five 
test  sentences,  and  3-D  perspective  plots  were  produced.  These  plots 
are  shown  in  Figures  2.5-2.14.  Several  points  were  observed  from  these 
plots.  First,  the  peadcs  in  the  LPC  spectra  were  generally  sharper 
than  those  of  the  cepstral  spectra.  Second,  however,  the  cepstral 
spectra,  on  the  whole,  had  much  more  local  variations  than  the  LPC 
spectra.  Third,  the  spectral  variations  caused  by  the  waveform  coders 
were  more  noti cable  in  the  LPC  case  than  in  the  cepstral  case.  On  the 
whole,  no  clear  advantage  for  either  of  the  two  e '.a lyses  could  be  found 
from  these  plots. 
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FIGURE  2.9  LPC  SPECTROGRAM  OF  L0N6BRAKE  AT  24  KBPS  (LL1) 


CEPSTRAL  SPECTROGRAM  OF  HIANCHOR  (LL1) 


FIGURE  2.12  CEPSTRAL  SPECTROGRAM  OF  CVSD  AT  16  KBPS  (LLl) 


FIGURE  2.13  CEPSTRAL  SPECTRC6RAII  OF  HY2  AT  24  KBPS  (LL1) 
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FIGURE  2.14  CEPSTRAL  SPEi 


2.3.2  The  Controlled  Diatortlon  Expqrinent 


The  purpose  of  the  controlled  diatortlon  ei^erlments  was  to  test 
tlio  candidate  measures  discussed  In  Section  2.2  as  to  their  resolving 
power  for  measuring  certain  classes  of  distortions.  In  all  oases,  the 
"original"  was  tahen  to  be  the  output  of  a 12  tap  LPC  sVnthesls  program 
where  the  coefficients  were  unquantlsed  and  the  pitch  »as  extracted  by 
hand.  Two  sets  of  signals  were  used.  One  set  consJ.^ted  of  four 
synthetic  vowels  (/!/ . , /u./  and  /a)/),  the  other  of  two  santances,  one 
spoken  by  a male  speaker  and  one  spoken  by  a f'^.male  speaker.  In  all 
cases,  five  classes  of  distortions  were  appJxedt  bandwidth  dlstcrticni 
frequency  distortion;  pitch  distortion;  I'jw  pass  filtering  distortion] 
and  additive  noise. 

2 . 3 . 2 . 1 Bandwidth  Distortion 

Distortions  in  the  baitdwid^.^  of  formants  is  a common  occurrence 
in  vocoders.  To  test  this  type  of  distortion,  the  unit  circle  was 
effectively  e;qpanded  by  tranjformlng  each  LPC  coefficient  by 

a^  a^(o)  . 2.19 

In  this  experiment,  the  four  values  of  a which  were  nsM  were  .99,  .98, 
.97,  and  .95.  Tie  first  two  values  introduced  no  perceivable  distortion. 

2. 3.2.2  Frequency  Distortion 

The  frequency  distortion  was  carried  out  by  up  or  do%ni  saiapling 
the  iiQpulsa  response  of  the  iiPC  synthesiser.  Figure  2.15  shows  the 
procedure.  First,  a FIR  (256  point)  approximation  for  the  HR  ioipulse 
response  was  calculated.  Then  a lero  padded  interpolation  was  performed 
using  a 1000  point  Kaiser  window  designed  linear  phase  low  pass  filter. 
The  resulting  modifiad  impulse  response  was  used  to  synthesize  the 


37 


spMCh  aAOples.  Sampling  ratios  of  49-SO.  50-49.  9-10.  and  10-9  were 
uaad. 

2. 3. 2. 3 Pitch  Distortion 

Pitch  distortion  was  applied  by  allowing  the  pitch  period  to 
systamatically  increase  over  the  voiced  regions.  It^is  results  in  pitch 
distortions  which  increased  with  tine  in  each  utterance.  The  rates  at 
which  the  periods  were  allowed  to  vary  was  ■•■1  sample  every  10  voiced 
fraatast  *1  sample  every  4 voiced  frames,  -1  sample  every  10  voiced 
frsMs,  and  -1  sample  every  4 voiced  frames. 

2. 3.2.4  Low  Pass  Filter  Distortion 

Bandlimitlng  distortions  are  very  connor  in  speech  coimunication 

th 

systems,  and  hence  worthy  of  study.  The  filters  used  were  all  10 
order  recursive  digital  eliptical  filters  with  rejection  bands  at  -60  DB. 
In  all,  four  filters  were  used  with  cutoffs  at  1.4  kHs,  1.8  kHs,  2.2  kHr, 
and  2.8  kHs. 

2. 3.2. 5 Additive  White  Noise  Distortion 

White  Gaussian  noise  was  also  added  to  the  test  signals.  Four 
noise  levels  were  used  which  resulted  in  signal  to  noise  ratios  of 
13  db,  ~ 10  db,  ~ 7 db,  and  ~ 3 db. 

2.3.3  Ttie  Euperirontal  Results 

In  all,  six  utterances,  four  vowels  .768  seconds  in  length  and 
tw.  sentancaa  3.072  seconds  in  length,  were  used  as  originals.  A total 
of  four  dlstortiona  for  each  of  the  five  classes  were  applied  to  the  six 
speech  samples,  giving  120  distorted  samples.  T e purpose  cf  the  vowel 
distortion  study  was  to  measure  the  effects  of  each  measure  in  "nicro" 
sense  in  order  to  compare  resolving  powers  of  the  different  measures. 

The  purpose  of  the  full  sentence  distortions  was  to  measure  the  "Mcro" 
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behavior  of  each  objective  measure.  In  all  cases,  the  total  sentence 


W (ra)  - , 2.21 

« 

where  G is  the  LPC  gain  of  the  original  sentence  in  the  frame.  The 
LPC  analyses  were  always  done  with  a Hiunning  windowed,  autocorrelation 
LPC  with  a frame  interval  of  256  samples  and  a window  width  of  256 
saunples.  The  gain  weighting  here  was  included  to  see  how  the  overall 
outcome  tiould  be  effected  as  a matter  of  academic  interest.  The 
hy^thesis  is  that,  since  the  vocalics  contain  a large  portion  of  the 
infonoation,  and  since  the  gain  is  always  greater  for  vocalics,  then  a 
gain  weighted  measure  might  be  more  highly  correlated  with  perceptual 
results.  This  experiment,  clearly,  gives  no  new  information  on  this 
hypothesis,  but  it  docs  show  to  what  extent  gain  weighting  changes  the 
final  objective  quality  estimate. 


variables,  all  with  the  sane  standard  deviation.  The  sample  variance 
was  calculated  from 


(d  -D  )' 
P,m  p 
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The  random  variable 
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D -D 
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is  t distributed  (^ee  Chaptbr  3i  with  tero  mean  and  unit  variance. 

A confidence  interval  for  D , vhe  's.r  xn  Man  for  D , for  a significance 

P P 

level  a (a  • ,01  and  .05)  can  be  c'^lculated  from 
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U 0 
oM  D 


< D <0 
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where  L „ and  U „ are  the  lower  and  upper  significance  limits  for  a t 
an  an 

distributed  random  variable  (p  ■ 0,  o <-  1)  for  M points  and  probability 

a. 

2. 3. 3.1  Results  of  tho  Vowel  Tests 

The  results  of  tha  vowel  tests  for  frequency  distortion  and 
bandwidth  distortion  are  compiled  in  Table  2.3,  the  reaulta  for  low 
pass  filtering  distortion  and  noisa  distortion  art  given  in  Table  2.4, 
and  tha  results  for  pitch  distortion  are  given  in  Table  2.5. 

Several  points  should  be  made  about  these  results.  First,  all 
of  tha  teats  seem  to  perform  ' elatively  well  on  the  t%ro  f.-equenry 
distortions,  with  all  tests  able  to  resolve  the  distortions  at  least 
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8PKCIAL 

DISTORTION 

MtXSURIS 

D^  LOG  LPC 
D^  LOG  LPC 
LOG  LPC 
D^  LINBAR  LPC 
D^  CBPSTRUN 
PJU«COR 
0^  FBKOBRCK 
Dj  AREA 

Dj  POLE  LOCATION 


SANDNIDTH  DISTORTIONS 

FREQUENCY  SHIFT 

DISTORTIONS 

a 

SHIFT  RATIOS 

.99 

.98 

.97 

.95 

50/49 

49/50 

10/9 

9/10 

AV. 

.076 

.13 

.22 

.37 

.08 

.07 

.91 

.83 

C.I. 

.03 

.04 

.06 

.12 

.03 

.03 

.11 

.10 

AV. 

.061 

.21 

.24 

.46 

.11 

.10 

1.2 

.90 

C.I. 

.03 

.05 

.04 

.12 

.04 

.02 

.12 

.10 

AV. 

.12 

.26 

.33 

.61 

.13 

.15 

1.6 

1.3 

C.I. 

.05 

.06 

.09 

.17 

.05 

.05 

.14 

.12 

AV. 

1260 

1541 

3021 

4077 

2041 

2112 

4510 

4910 

C.I. 

825 

1051 

1121 

1642 

914 

921 

2013 

2412 

AV. 

.068 

.22 

.25 

.42 

.14 

.12 

1.3 

.91 

C.I. 

.03 

.05 

.06 

.13 

.03 

.03 

.11 

.11 

AV. 

1.1 

1.6 

1.8 

2.3 

1.5 

1.3 

3.2 

2.1 

C.I. 

.06 

.05 

.07 

.08 

.04 

.02 

1.2 

.09 

AV. 

113 

191 

215 

421 

104 

127 

411 

402 

C.I. 

61 

75 

112 

181 

55 

67 

172 

101 

AV. 

1.1 

2.2 

3.1 

5.7 

1.4 

1.2 

3.7 

3.2 

C.I. 

0.2 

0.2 

0.4 

l.l 

.31 

.32 

.62 

.59 

AV. 

2.3 

2.7 

2.9 

4.1 

2.1 

1.9 

4.2 

3.8 

C.I. 

.93 

1.6 

1.9 

2.2 

.91 

.80 

2.1 

2.3 

AV.  ■ AverB9e  C.I.  ••  Confidence  Interval  (.OS  Level) 


Table  2.3  Reeults  of  the  Bandwidth  Distortions 
and  Frequency  Shift  Dietortions  on 
Vowels.  All  Confidence  Intervals  are 
at  the  .OS  Level. 
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SPECIAL 

DISTORTION 

MEASURES 

D^  UX:  LPC 
Ojj  lOG  LPC 

O.  LOG  LPC 

4 

LIIZAR  LPC 
Dj  CEPSTRUN 
Oj  PARCOR 
D^  rSBOBACK 
1*2  area 

Dj  POLE  LOCATION 

Table 


HANDLIMIT 

DISTORTION 

NOISE  DISTORTION 

DANDLIMIT 

S/N 

2.6 

2.2 

1.8 

1.4 

13 

10 

7 

3 

AV. 

7.3 

12.1 

14.6 

16.2 

1.7 

2.8 

5.0 

7.6 

C.  I. 

1.1 

2.4 

2.8 

3.5 

.22 

.62 

.97 

1.81 

AV. 

8. 

13.3 

15.6 

17.5 

1.9 

3.2 

5.2 

8.6 

C.I. 

1.2 

2.3 

3.1 

3.6 

.31 

.82 

1.4 

2.6 

AV. 

9.4 

14.4 

16.7 

CD 

2.4 

3.6 

5.6 

10.1 

C.I. 

1.4 

2.5 

3.5 

3.7 

.40 

1.02 

1.05 

1.19 

AV. 

6851 

7175 

6281 

9143 

5431 

5941 

6643 

7141 

C.I. 

855 

991 

1097 

1211 

2413 

2712 

3143 

4127 

AV. 

8.8 

14.1 

16.0 

18.1 

1.6 

3.1 

5.2 

8.8 

C.I. 

1.3 

2.2 

3.3 

3.6 

.33 

.91 

1.3 

2.7 

AV. 

5.2 

5.5 

5.9 

6.3 

3.1 

4.3 

4.6 

C.I. 

1.1 

1.3 

1.2 

1.6 

.81 

.80 

.93 

.92 

AV. 

827 

955 

1010 

1210 

621 

751 

827 

921 

C.I 

310 

341 

381 

425 

125 

281 

317 

397 

AV. 

5.3 

5e9 

6.6 

6.9 

2.8 

2.9 

3.1 

3.3 

C.I. 

.34 

.41 

.55 

.57 

.21 

.35 

.44 

.89 

AV. 

6.6 

6.7 

6.7 

6.9 

4.1 

4.4 

4.9 

5.2 

C.I. 

3.4 

3.3 

3.3 

3.6 

2.2 

2.1 

2.7 

2.C 

AV.  - Average  C.I.  " Confidence  Irterval  (.05  Level) 


.4  Reeulte  of  the  Bandllmit  Dlatortion  and  Additive 
Noiee  Distortion  on  Vowels.  All  Confidence 
Intervals  Are  at  the  .05  Level. 
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che  <05  level.  This  point  is  also  illustrated  in  Figures  2.15  and  2.16, 
which  show  the  time  behavior  of  the  d^  lug  LPC  measure  for  the  frequency 
and  bandwidth  distortion.  As  judged  by  their  confidence  intervals, 
the  leg  LPC  measures  are  the  best,  while  the  pole  position  and  feedback 
coefficients  are  the  worst  for  these  two  frequency  distortions.  Second, 
note  that,  for  low  pass  filter  distortion  (Table  2.4),  the  results  are 
qiialitatively  the  same  as  those  above.  But  also  note  that  quantitat.'.vely 
they  are  very  different,  giving  much  greater  spectral  distances  than  the 
bandwidth  and  frequency  shift  distortions.  This  can  also  be  seen  in 
Figure  2.17.  This  brings  up  an  important,  if  obvious,  point. 

That  is  that  low  pass  filtering  distortion  swamps  the  more  subtle  forms 
of  frequency  distortion.  Hence,  some  bandwidth  decision  and  control 
is  necessary  in  these  objective  tests  if  the  more  subtle  distortions  are 
to  be  measured. 

The  noise  results  show  some  resolving  power  for  the  various  noise 
levels,  but  a general  loss  of  resolution  when  compared  to  the  frequency 
and  bandwidth  results.  Stated  simply,  this  type  of  distortion  is  not 
measured  well  by  spectral  dlstarce  measures,  and  hence  requires  a large 
saiqple  of  speech  to  detect  it  properly. 

The  results  of  the  pitch  variation  studies  presented  in  Table 
2.5  show  that  essentially  no  spectral  distance  measure  can  detect  pitch 
errors  with  the  number  of  samples  used  in  this  experiment.  This,  of 
courae,  was  an  expected  result,  and  was  the  reason  that  the  special 
pitch  tests  were  included. 

The  cepstral  pitch  measure  described  in  Section  2. 2. 2.  2 was 
applied  to  the  four  pitch  distortions  using  each  of  the  four  smoothing 
window  functions  shown  in  Figure  2.17. 
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^2- BANDWIDTH  DISTORTION  FACTOR  - .90 


TIME 


d2  - BANDWIDTH  DISTORTION  FACTOR  - .91 

1.0db| 


d2- BANDWIDTH  DISTORTION  FACTOR  - .97 

t 


SPECTRUM  ERROR 


SPECTRAL 

DISTORTION 

MEASURES 

PITCH  DISTORTION 

10,1 

10, -i 

4,1 

4,-1 

D,  LOG  LPC  AV. 

.071 

.064 

.073 

.072 

C.I. 

.03 

.03 

.04 

.03 

D LOG  LPC  AV. 

.079 

.081 

.076 

.078 

C.I. 

.03 

.03 

.03 

.03 

D LOG  LPC  AV. 

.09 

.092 

.084 

.092 

C.I. 

.04 

.05 

.04 

.04 

D,  LINEAR  LPC  AV. 

821 

871 

888 

841 

C.I. 

640 

510 

530 

511 

D,  CEPSTRUM  AV. 

.82 

.86 

.84 

.81 

^ C.I. 

.03 

.03 

.04 

.03 

D PARCOR  AV. 

.91 

.84 

.88 

.G6 

C.I. 

.06 

.05 

,06 

.05 

D.  FEEDBACK  AV. 

87 

88 

83 

89 

^ C.I, 

48 

51 

55 

46 

D AREA  AV. 

.91 

.96 

.01 

.86 

^ C.I. 

.21 

.23 

.20 

.19 

D POLE  LOCATION  AV. 

2.1 

2.0 

2.2 

2.3 

C.I. 

1,10 

1,02 

1.05 

.98 

AV.  * Average 

C.I.  « Confidence  Interval  (.05  Level) 


Table  2.5  Rasul ta  of  the  Pitch  Distortions  on  Vowels. 

Note  that  the  Distortions  are  Low,  and  In- 
crease Distortions  Cause  No  Increase  in  the 

Measures . 


Since  this  was  a time  varying  dlstortionf  then  the  statistical 
analysis  used  in  the  spectral  distance  tests  is  inappropriate.  Figures 
2,18-2.21  show  the  results  for  the  four  windows.  The  basic  result 
here  is  that  this  measure  forms  a high  resolution  measure  of  pitch 
errors.  For  short  windows,  the  measure  detects  very  small  errors,  but 
saturates  quickly,  hence  reporting  the  same  result  for  all  errors. 

Longer  windows  do  a better  qualification  of  the  pitch  errors,  but  dc  not 
pick  up  small  errors  well.  Probably,  since  most  of  the  computation  in 
this  measure  is  in  the  cepstrum  calculation,  a reasonable  solution 
would  be  to  use  several  windows  to  better  quantify  the  results. 

2. 3. 3.2  Results  of  the  Sentence  Tests 

The  results  of  the  sentence  tests  are  tabulated  in  Table  2.6, 

2.7,  and  2,8.  Qualitatively,  these  results  pretty  well  mirror  the 
results  of  the  vowel  tests.  Quamtitatively,  however,  the  confidence 
intervals  are  uniformly  larger.  The  general  result  here,  therefore,  is 
that  larger  san^ile  sizes  are  necessary  when  dealing  with  real  sentences. 

An  is^rtant  result  of  the  sentence  tests  can  be  seen  from  a 
comparison  of  the  gain  weighted  measures  to  the  non  gain  weighted 
measures,  as  shown  in  Table  2.9.  in  nearly  every  case,  the  gain 
weighting  causes  the  measure  tc  decrease.  This  means  the  irvicisure  is 
being  inflated  by  the  low  power  unvoiced  regions  which  are  perceptually 
less  important  than  the  high  vocalic  regions.  This  means  that  gain 
weighting  probably  will  give  better  subjective  correlation. 

2.4  The  FARM  Correlation  Study 

As  was  stated  in  the  introduction,  the  FARM  subjective  quality 
data  base  offers  a good  chance  to  study  the  correlation  between  the 
objective  measures  under  consideration  and  the  isometric  subjective 


TIME 


FIOURE  Ml  CEPSTRAL  PITCH  METRIC  AS  A FUNCTION  OF  TIME  FOR  FOUR 
DIFFERENT  PITCH  DISTORTIONS  FOR  WINJjOW  NO.  1 (FIGURE 
2.3),  WINDOW  LENGTH  - 1. 
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FIGURE  2 21  CEPSTRAL  PITCH  METRIC  AS  A FUNCTION  OF  TIME  FOR  FOUR  DIFFERENT 
DISTORTIONS  FOR  WINDOW  NO.  4 (FIGURE  2J).  WINDOW  LENGTH  » 10. 
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SPECTRAL 

BANDMIDTH  DISTORTIONS 

FREQUBNCV  SHIFT  DISTORTIONS 

DISTORTION 

1 

MEASURES 

a 

SHIFT  RATIOS 

. 99 

. 98 

•97 

1 

.95 

50/49 

49/50 

10/5 

9/10 

D,  LOG  LPC 

AV. 

. 54 

.88 

1.2 

1.6 

.61 

. 58 

1.7 

1.9 

C.I. 

. 13 

.13 

.16 

. 22 

.13 

.12 

.19 

. 24 

b,  LOG  LPC 

AV. 

.62 

. 94 

1.56 

1.9 

.71 

.68 

2 . 4 

2 . 2 

C.I. 

. 12 

. 14 

.17 

.23 

. 14 

.13 

. 27 

. 28 

D.  LOG  LPC 

AV  . 

.83 

1.21 

1.8 

2.2 

. 94 

1 . 02 

3 . 1 

3.4 

C.I. 

. 13 

. 16 

.19 

. 24 

. 18 

. 16 

. 29 

. 29 

D.  LINEAR  LPC 

AV. 

2910 

3816 

4715 

6144 

3415 

2916 

6913 

6314 

C.I. 

2010 

2415 

3103 

3310 

2413 

1918 

3412 

3321 

D,  CBP8TRUM 

AV. 

.75 

1.05 

1.60 

2.0 

. 82 

.77 

1 . 96 

2 . 1 

C.I. 

. 14 

. 14 

.19 

.23 

.15 

. 16 

. 3 

. 29 

D,  PARCOR 

AV. 

2.4 

2.9 

2.9 

4.1 

1.9 

1.8 

4.1 

3.2 

C.I. 

1.6 

1 . 5 

1.7 

2.2 

. 1 

1.2 

1.0 

2 . 1 

1.8 

0,  FEEDBACK 

AV. 

420 

461 

520 

850 

480 

455 

1023 

981 

225 

251  ' 

312 

515 

310 

295 

612 

580 

D area 

AV. 

3 . 4 

3.9 

5.9 

1 

8.2 

3.3 

3.5 

8.1 

8.1 

C.I. 

1.2 

1.3 

2.4 

4.2 

1.4 

1.1 

3 . 4 

4 . 1 

D-  POLE  LOCATION  AV. 

4.6 

4.9 

5.4 

6.3 

1 

4 . 8 

4.6 

6.8 

6 . 3 

C.I. 

1 

2.4 

3.1 

4.1 

4.8 

3.1 

1 

2.8 

4 . 4 

4 . 2 

AV . - Average  C.I.  • Confidence  Intervals 


Table  2.6  Results  of  the  Bandwidth  Distortions  and 
Frequency  Shift  Distortions  on  Sentences. 
All  Confidence  Intervals  are  at  the  .05 
Levels . 


SPECTRAL 

DISTORTION 

MEASURES 


I>OG  LPC 


D^  LOG  LPC 


D.  LOG  LPC 

4 


LINEAR  LPC 


D^  CEPSTRUM 


Oj  PARCOR 


Dj  FEEDBACK 


Dj  AREA 


Dj  POLE  LOCATION 


BANDLIMIT  DISTORTION 
BAMDLIMIT 


2.8 

2.2 

1.8 

7,5 

15.4 

I 

16.8 

2.7 

5.8 

6.1 

16.3 

16.9 

1.3 

7.2 

7.1 

8.4 

16.2 

16.8 

1.5 

6.8 

7.5 

8142 

9317 

9581 

2014 

2713 

2312 

5.4 

8.3 

12.4 

1.3 

2.2 

3.1 

7.1 

8.3 

8.9 

3.6 

3.9 

4.7 

1013 

1314 

1517 

692  851 


6,7 

7,3 

8.2 

1.3 

1.9 

2.3 

7.  2 

7.7 

7.5 

4.4 

4,7 

3.9 

1.4 

13 

10 

17.2 

1.1 

2.1 

9.6 

.51 

1.2 

17.5 

1.2 

2.4 

9.2 

.62 

1.4 

17.5 

1.6 

2.9 

8.2 

.77 

1.31 

9721 

4213 

5176 

3140 

2913 

2310 

16.3 

1.4 

2.2 

4.4 

.52 

1.3 

9.2 

6.2 

6.7 

5,3 

4.4 

4.5 

1712 

823 

941 

1003 

512 

590 

8.8 

4.2 

4.4 

2.6 

2.1 

2.2 

7.8 

6.3 

6.7 

4.6 

3.1 

3.6 

NOISE  DISTORTION 
S/N 

10  7 


AV.  “ Average 


C.I.  = Confidence  Interval  (.05) 


Table  2.7  Results  of  the  Bandliailt  Distortions  and 
Additive  Noise  Distortion  on  Sentences. 
All  Confidence  Intervals  are  at  the  .05 
Significance  Level. 


DISTORTION 

NON-GAIN  WEIGHTED 

GAIN  WEIGHTED 

Bandwidth 

.99 

.62 

.38 

Bandwidth 

.98 

.94 

.67 

Bandwidth 

.97 

1.56 

1.64 

Bandwidth 

.95 

1.9 

1.51 

Frequency 

Shift  50/49 

.71 

.37 

Frequency  Shift  49/50 

.68 

.37 

Frequency  Shift  10/9 

2.4 

1.92 

Frequency 

Shift  9/10 

2.2 

2.12 

Bandliait 

2.8  IcHz 

6.1 

4.3 

Bandlinit 

2.2  kHz 

16.3 

12.4 

Bandlirait 

1.8  kHz 

16.9 

14.7 

Bandlimit 

1.4  kHz 

17.5 

16.8 

Noise  13 

db 

1.2 

.82 

Noise  10 

db 

2.4 

1.81 

Noise  7 

db 

4.1 

3.6 

Noise  3 

dB 

6 . b 

5.4 

Table  2.9  Comparison  of  Gain  Weighted  D2  Log  LPC  Spectral 
Metrics  to  Non-Gain  Weighted  D2  Log  LPC  Spectral 
Metrics. 
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results  available  from  the  PABM.  Since  many  of  the  objective  measures 
under  study  are  con^utatlonally  Intensive,  the  computer  time  limited  the 
total  number  of  speech  digitization  systems  which  could  be  used  as  part 
of  the  study.  In  all,  eight  systems  were  studied,  as  shown  in  Table 
2.10.  These  systems  were  chosen  to  (1)  represent  a cross-section  of 
speech  digitization  techniques,  including  waveform  coders  (CVSD) , LPC's, 
channel  vocoders,  and  APC's,  and  (2)  these  systems  overlapped  with  the 
systems  used  in  the  development  of  a parametric  quality  test,  called  the 
"QUART"  Test  (2.24].  This  allows  some  minimal  correlation  studies  between 
the  objective  quality  measures  produced  here  and  the  parametric  results 
availid^le  from  the  QUART  test. 

2.4.1  The  FARM  Data  Base 

The  FARM  data  base  arrived  at  Georgia  Tech  as  fourteen  boxes  of 
cards,  with  control  cards  for  processing  under  an  IBM  operating  system. 
Since  correlation  studies  require  many  accesses  of  the  data  base,  and 
since  the  accesses  eu:e  random,  a linear  data  base  such  as  that  repre- 
sented by  the  cards  is  unacceptable.  An  accepteUole  data  base  organiza- 
tion must  (1)  be  stored  in  numeric  (two's  con^lement)  form  rather  than 
character  form,  and  (2)  must  be  accessable  by  some  coding  scheme  which 
does  not  require  the  linear  seeurching  of  the  disk  based  data.  To  do 
this,  the  system  of  Figure  2.22  was  developed.  In  this  system,  a 
"MAIN  DATA  FILE"  was  organized  in  which  each  set  of  responses  for  each 
subject  is  allocated  a direct  accessable  block  of  64  sixteen  bit  words, 
60  for  the  subject's  responses  and  four  for  a label.  To  go  with  this 
main  file,  four  "POINTER  FILES"  were  developed.  The  first  pointer  file, 
the  "FARM  IDENTITY  FILE,"  as  an  an*Ty  for  each  FARM  giving  basic  P7^. 
data,  such  as  systems  involved,  speakers  involved,  and  pointer  to  the 
main  data  file.  The  second  pointer  file,  the  "SPEAKER  FILE,"  has 


SYSTEMS 

POINTER 

FILE 


MAIN 
>ARM 
)ATA 
FILE 

(SO  RESPONSES 
PER  RECORD) 


SUBJECT 

POINTER 

FILE 


SPEAKER 

POINTER 

FILE 


FARM 

POINTER 

FILE 


FIGURE  2.22 


LAYOUT  OF  FARM  ACCESS  DATA  USED  AS  PART  OF  THIS  STUDY 
EACH  BOX  REPRESENTS  A DISK  FILE.  THE  DATA  IS  PRESORTED 
IN  THE  DATA  FILES  TO  ALLOW  EASY  ACCESS  OF  THE  FARM 
DATA  SETS. 


infomation  for  each  speaker  as  to  where  each  PARM  involving  that  speaker 
is  located.  The  third  file,  the  ’’SUBJECT  FILE"  contains  a list,  by 
subject,  of  where  each  of  that  subject's  responses  is  located.  The 
last  pointer  file,  the  "SYSTEM  ,;‘ILE"  contains,  for  each  system,  the 
location  of  all  that  system's  subjective  data. 

Ihe  idea  behind  this  organization  is  that,  by  presorting  on  the 
information  of  potential  data  subsets  of  interest,  the  average  access 
time  for  a particular  statistical  measure  can  be  greatly  reduced. 

Hence,  a statistical  program  need  only  seaurch  the  much  smaller  pointer 
files  for  information  rather  tham  searching  the  whole  data  base. 

Further,  since  within  each  pointer  file  the  data  is  ordered  by  increasing 
PASM  number,  then  only  a minimum  number  of  accesses  of  the  main  data 
file  are  necessary  on  a particular  run. 

Two  things  should  be  noted  aibout  this  data  base  organization. 
First,  the  presorting  of  this  data  is  a non-trivial  con^utational  task, 
involving  many  hours  of  coo^uter  sorting.  This  data  base  itself, 
therefore,  is  an  is^rtant  output  of  this  effort,  and  may  be  used  in 
the  future  for  many  classes  of  studies.  Second,  due  to  time  constraints, 
DCEC  was  unable  to  make  available  enough  information  concerning  the 
PARM  data  to  take  full  eulvantage  of  this  data  base.  Hence,  the 
statistical  resolving  power  afforded  by  this  data  base  is  better  them 
that  achieved  by  this  study.  Details  of  how  the  analysis  could  be 
improved  Is  given  later  in  this  section. 

2.4.2  The  Statistical  Analysis 

The  objective  measures  used  in  this  study  are  shown  in  Table 
2.11.  The  measures  involved  are  essentially  all  the  spectral  distance 
measures  used  in  the  controlled  distortion  study  (Section  2.1)  plus 
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1.  LOG  LPC 

2.  Dj^  LOG  LPC  GAIN  WEIGHTED 

3.  Dj  LOG  LPC 

4.  LOG  LPC  GAIN  WEIGHTED 

5.  D^  LOG  LPC 

4 

6.  D.  LOG  LPC  GAIN 

4 

7 . D^  LINEAR 

8.  D^  LINEAR  GAIN  WEIGHTED 

9.  Dj^  CEPSTRUM 

10.  Dj^  CEPSTRUM  GAIN  WEIGHTED 

11.  D2  PARCOR 

12.  D2  FEEDBACK 

13.  D2  AREA 

14.  D2  POLE  LOCATION 

15.  D2  ENERGY  RATIO 


Table  2.11  Objective  Measures  Used  In  the 
PARH  Correlation  Study. 
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one  additional  measure  which  has  had  some  attention  in  the  literature 
[2.38]. 

The  speech  data  used  for  this  study  was  twelve  sentences  for 
each  of  two  speakers  (LL  and  CH)  for  each  of  the  systems  of  Table  2.11. 
After  the  measures  vrere  applied*  the  statistical  analysis  performed  was 
identical  to  that  done  for  the  controlled  distortion  tests. 

In  the  correlation  study*  the  categories  recognised  were 
"SUBJECT"  and  "SPEAKER."  If  the  information  had  been  available  as  to 
exactly  which  sentence  was  involved  in  which  PARM*  then  "SENTENCE" 
could  have  been  a category,  increasing  the  degrees  of  freedom  by 
approximately  a factor  of  .<«ix.  The  correlation  coefficients  calculated 
were  from 

P - Z I I I Pa  2-25 

K subjects  speakers  systems 


where 


p.  - — ) (-X — ) 


s 


2.26 


where  "a"  is  the  conuicion  including  subject,  speaker,  and  system, 
is  the  distortion  measure  for  that  system,  D is  the  estimate  of  D,  X 

a 

is  the  subjects  response  to  condition  "a",  X is  the  average  response 

8 

A 

for  that  subject  over  all  systems,  o Is  the  sample  standard  deviation 

8 

A 

for  the  subject  "s,"  and  is  the  sample  standard  deviation  for  the 
objective  distortion  measures. 

In  order  to  understand  how  theae  results  are  tabulated,  it  la 
first  necessary  to  understand  how  results  from  the  objective  measures 
can  be  vised  to  predict  results  from  subjective  tests. 
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The  most,  straightforward  way  of  deriving  an  o.stimate  of  the 
subjectivo  quality  is  now  given.  Since  both  the  subjective  and  objec- 
tive measures  for  quality  are  means  of  a large  number  of  independent 
estimates f then  their  marginal  probability  distribution  functions  are 
aayn^totical  ly  nonnal,  a.'id,  by  the  Bivariate  Central  hiait  theorem, 
the  joint  probability  distribution  function  is  given  by  the  Bivariate 
normal  distribution; 


f(X,D)  " 


2irCjjODA-p'' 


. 1 , ,X-X. 2 

expt {(- — ) - 

2(l-p2)  “x 


2p  (X-X)  (D-D)  ^ D-D.  2 

o rt  o 

X D D 


2.27 


where  X is  the  subjective  measure,  0 is  the  objective  measure,  is  the 
variance  of  the  subjective  measure,  o^  is  the  variance  of  the  objective 
measure,  and  p is  the  correlation  coefficient.  For  this  case,  the 
minimum  variance  unbiased  estimator  of  X from  D is  given  by 

po 

X " X + — - (D-D)  2.28 

Or. 


where  the  veiriance  of  this  measure  is  given  by 
E(X  - E(x|d))^  - 0^(1  - p^)  . 


2.29 


If  X,  0,  c^,  o^,  and  p were  )cno%m,  this  problem  would  b«  solved,  sitwe 
this  is  enough  information  to  calculate  confidence  intervale  on  X or  to 
do  null  hypothesia  testing  between  syetems.  However,  estimates  for 

A A A 

these  quantities,  called  X,  D,  o^,  a^,  and  p,  must  be  used  instead, 
and  these  quantities  are  random  variables  themselves.  Hence,  the  p.d.f. 
(Probability  Distribution  Function)  is  no  longer  normal,  and  is,  in 


bi. 


general > very  difficult  to  calculatci  Ir.  closed  font. 

However,  considering  t^ni  problem  from  the  point  of  view  of 
regression  analysis  theory  offers  additional  information.  The  form  of 
the  linear  regression  esciraation  is  given  by 

X = . 2.30 


From  the  Gauss>N2u:]cov  Theorem  [2.40],  the  least  squ  .res  estimate  is  the  unbiased 
minimum  variemce  eatimate  for  X,  and  for  this  case  {this  is  really  an 
LPC  analysis) 


and 


N 


N 


I I D-j) 

j»i  ^ ^ j-i  ^ j«i  J 
N , N , 

I f5-(  I 

j-i  ^ j-i  3 


po. 


2.31 


. , N N 

Pi  - I N - 02  Dj) 

j«l  ^ 3-1  ^ 


2.32 


Two  points  should  be  made  here.  First,  these  results  show  that  the 

minimum  varlanceunbiascdestimator  of  X from  o is  gotten  by  using  the 

minimum  variance  unbiased  estimations  for  D,  X,  and  p in 

Equation  2.28.  Second,  it  should  be  noted  that  under  a mild  set  cf 

conditions  easily  met  by  the  tests  here,  that  four  conditions  hold: 

2 

(1)  a minimum  variance  unbiased  estimate  for  o^,  the  variance  in  our 
approximation  of  the  subjective  quality,  is  given  by 


X 


-1^2  ,V^i  - h - 

j-1 


2.32 
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(2)  minimum  variance  unbiased  estimates  for  the  variemce  in  6^  is 
given  by 


— 2 


N _ 
I (X.-X) 
i-1 


-)  » 


2.33 


(3)  a minimum  variance  unbiased  estimate  for  the  estimate  for 
given  by 


I (X  -X)^ 
i^l  ^ 


4:.  34 


and  (4)  the  estimates  for  and  and  B^)  »te  normal  distributed, 

'•0  2 “2  2'  *2  2 ? 

formed  from  o^/o  , 0.  /o  , and  ot  /o^  are  x distributed,  and  all  five 
estimates  are  independent.  These  four  points  give  all  of  the  statisti- 
cal power  necessary  to  do  all  the  hypothesis  testing  and  confidence 
interval  estimation  which  is  normally  associated  with  statistical 
testing  and  estimation.  For  exanple,  if  a confidence  interval  for 

was  desired,  it  is  only  necessary  to  note  that  (B.-  B./a..)  is  t 

®1 

distributed,  and  the  confidence  interval  is  given  by 


U 


d(N-2) 


< e, 


B,  ' L 


(N-2)' 


2.35 


where  U and  L „ . are  the  upper  and  lower  significance  limits  for 
ciN-2  oH-2 

a t distributed  (u  ■■  0,  o - 1)  for  N-2  degrees  of  freedom  and  probability 
a. 

There  are  really  two  questions  which  these  tests  seek  to  answer. 
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First,  assuming  that  the  estimates  we  have  for  correlations,  means,  and 
variance  are  exactly  correct,  what  would  then  be  the  confidence  intervals 
on  our  estimates  of  X?  This  question  seeks  to  ascertain  the  potential 
of  the  objective  measures  used  here  to  predict  subjective  results. 

Second,  considering  all  the  distorting  factors  in  our  analysis,  especial- 
ly our  errors,  in  estimating  8^  ar.d  6^,  what  then  is  the  resolving  power 
of  our  test?  These  questions  address  the  us£Lble  resolving  power  of 
subjective  acceptability  estimates  based  on  the  emalysis  performed  so 
far.  The  answer  to  the  first  question  can  be  addressed  by  applying 
equation  2.29  to  the  estimate  of  the  correlations  Equation  2.25)  of 
the  correlation  coefficients.  The  answer  to  the  second  question  c^ul  be 
observed  by  applying  equation  2.32  to  the  data. 

2 . 'J . 3 The  Experimental  Results 

The  correlation  studies  described  ahove  were  carried  out  cn 
three  sets  of  the  data:  all  the  systems;  only  the  vocoder  systems 
(LPC  and  channel  ’/ocoders) ; and  only  the  waveform  coders.  The  results 
for  the  three  studies  are  given  in  Tables  2.12,  2.13,  and  2.14, 
respectively.  Several  points  should  be  made  here.  First,  the  correla- 
tion coefficients  for  a number  of  measures  are  quite  high,  some  as  high 
as  .83.  The  "BEST"  measure?  seem  to  be  gain  weighted  spectral  distance 
measures,  as  expected.  Second,  however,  note  that  the  estimated 
standard  deviations  are  somewhat  larger  than  desired^le.  This  indicates 
that  more  data  should  be  used  to  better  establish  these  resuJ.ts.  Third, 
note  that  mach  better  results  are  obtained  for  the  small  subclasses  than 
for  the  whole.  This  indicates  that  these  measures  wor]c  best  if  the 
systems  being  tested  are  preclassified  according  to  the  type  of 
distortion  expected. 
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DISTORTION 

MERSURfiS 

p— 

0 

A 

®el 

bOG  LPC 

-.76 

10.24 

LOG  LPC  GAIN  IfBZGHTBO 

-.79 

8.13 

Dj  LOG  LPC 

-.78 

8.85 

1/X  LPC  GAIN  WEIGHTED 

-.81 

7.21 

D.  LOG  LPC 

4 

-.73 

14.31 

LOG  LPC  GAIN  WEIGHTED 

-.78 

8.31 

D^  LINEAR  LPC 

-.61 

17.21 

D^  LINEAR  LPC 

-.66 

13.21 

D^  CEPSTRUM 

-.79 

7.64 

D^  CEPSTRUM  GAIN  WEIGHTED 

-.81 

6.98 

D^  PARCOR 

-.55 

22.1 

Dj  FEEDBACK 

-.23 

37.1 

D^  AREA 

-.76 

12.41 

Dj  POLE  LOCATION 

-.25 

21.6 

D^  ENERGY  RATIO 

+.78 

9.2 

p « Correlation  estimate' 

0^^  * Ideal  standard  deviation  estimate 

- Standard  deviation  estimate  (full 


Table  2.12  Results  of  Correlation 
For  Total  Set  of  Sytei 


SPECTRAL 

DISTORTION 


MEASURES 

A 

P 

%I 

Q > 

• 

LOG  LPC 

-.79 

8.13 

14.23 

LOG  LPC  GAIN  NBIGHTEO 

-.81 

7.15 

12.2 

LOG  LPC 

-.79 

8.27 

18.3 

LOG  LPC  ^MN  NBIGHTEO 

-.83 

6.63 

13.4 

LOG  LPC 

-.77 

8.95 

18.1 

LOG  LPC  GAIN  WEIGHTED 

-.81 

7.29 

14.9 

LINEAR  LPC 

-.70 

16.31 

31.6 

0^  LINISAR  LPC  GAIN  NBIGHTEO 

-.74 

14.52 

28.4 

CBPSTRUM 

-.81 

7.52 

13.72 

0^  CEPSTRUM  GAIN  WEIGHTED 

-.83 

6.81 

13.14 

0^  PARCOR 

-.61 

18.22 

34.31 

-.33 

29.2 

43.21 

D^  AREA 

-.78 

10.21 

21.21 

D^  POLE  LOCATION 

-.36 

36.3 

61.3 

Dj  ENERGY  RATIOS 

+.80 

7.82 

14.9 

p > Corr«lation  eBtinate 

> Id«al  standard  deviation  estimate  (assume  pop) 

A 

• Standard  deviation  estimate  (full  statistics) 


Table  2.13  Result*  of  Correlation  Study 
Usint;  Only  Vocoders. 


1' 


t 

1 


li 

•a 
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SPECTRAL 

DISTORTION 

MEASURES 


P 

®el 

0 

e 

log  LPC 

-.79 

8.23 

14.12 

LOG  LPC  GAIN  WEIGHED 

-.80 

7.91 

13.90 

Dj  LOG  LPC 

-.78 

9.41 

18.91 

0^  LOG  LPC  GAIN  WEIGHTED 

-.82 

6.78 

12.21 

LOG  LPC 

-.76 

12.2 

24.31 

LOG  LPC  GAIN  WEIGHTED 

4 

-.80 

7.98 

18.32 

Dj  LINEAR  LPC 

-.73 

14.23 

29.31 

D^  LINEAR  LPC  GAIN  WEIGHTED 

-.75 

12.9 

;>6,2l 

D^  CEPSTRUM 

-.79 

9.;’i 

16.51 

0^  CEPSTRUM  GAIN  WEIGHTED 

-.81 

6.91 

12.91 

0^  PARCOR 

-.58 

27.4 

42.95 

Dj  FEEDBACK 

-.21 

40.2 

51.2 

0^  AREA 

-.74 

18.4 

40.91 

POLE  LOCATION 

-.31 

29.6 

51.9 

Dj  ENERGY  Rrt  '10 

+ .76 

16.3 

33.6 

p > Correlation  astlmate 

^ A 

* Ideal  atandard  deviation  estimate  (assuming  p*p) 
0^  ■ Standard  deviation  estimate  (full  statistics) 


Table  2.14  Results  of  waveform  Coder  Using 
Only  Waveform  Coders 


These  are  certainly  encouraging  results.  With  measures  as 
highly  correlated  as  these,  there  is  good  expectation  of  creating  a 
viable  objective  quality  test.  However,  the  relatively  large  estimated 
standard  deviations  in  the  estimates  which  include  all  statistics 
indicate  more  data  must  be  processed  to  Increase  the  resolving  power 
of  these  tests  to  a maximum. 

2.5  Summary  and  Areas  for  Future  Research 

The  major  results  of  this  study  can  be  s’jmmarized  as  follows. 

(1)  A number  of  objective  quality  measures,  particularly 
spectral  distance  metrics,  offer  considerable  promise  in  predicting 
subjective  quality  results. 

(2)  Some  of  the  measures  tested  are  clearly  better  than  the 
others.  The  best  are  the  gain  weighted  log  LPC  spectral  distance 
measure  and  the  gain  weighted  cepstral  measure.  These  two  measures 
are  highly  correlated  with  each  other  (2. 35]. 

(3)  Several  measures  do  consistently  poorly.  Two  of  these  are 
the  D2  feedback  coefficient  measure  and  the  pole  location  measure. 

The  pole  location  measure  would  probably  io^rove  if  some  sort  of  formant 
extraction  w&j  attempted. 

(4)  The  ^2  measure  did  quite  well.  This  is  interesting 

since  it  is  so  conf>utationally  compact. 

(5)  Gain  weighting  gave  a slight,  but  consistent.  Improvement 
in  the  subjective-objective  correlations. 

(6)  Based  on  the  values  of  p obtained  in  this  rtudy,  the 
potential  for  using  several  of  the  measures  for  prsdictin-i  subjective 
scores  is  good.  However,  it  should  be  noted  that,  even  if  p"p,  the 
resolving  power  of  these  tests  falls  short  (by  approximately  a power 
of  2-2.5)  of  the  subjective  tests  theMslves.  However,  subjective  and 
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objective  measures  may  be  combined  to  improve  resolution.  This  is 
easily  done  so  long  as  the  number  of  subjective  tests  used  warrants  the 
use  of  the  Bivariate  Normal  Distribution. 


(7)  The  resolving  power  of  the  actual  tests  which  resulted  from 
this  study  are  nowhere  near  as  good  as  the  “potential"  resolving  power. 

A 

This  is  because  the  resolving  power  of  the  tests  in  this  study  on  p 
was  not  good  enough.  This  could  be  improved  by  doing  a lower  level 
correlation  between  a subject's  response  and  the  objective  measure  for 
the  exact  sentence  used,  and  by  using  a larger  protion  of  the  BARM  data 
base  as  part  of  the  study.  It  should  be  noted,  however,  that  although 
it  is  interesting  to  speculate  on  the  improvement  in  the  estimates  of 
p that  further  testing  would  accoiqplish,  no  results  should  be  assumed 
until  the  testing  is  complete. 

The  results  of  this  study  offer  a number  of  areas  fox  future 
research.  Some  of  these  are  listed  below. 

(1)  An  obvious  extension  to  this  study  would  be  to  «vt'ind  the 
portion  of  the  FARM  data  base  used  in  this  study.  This  might  well 
iDiprove  its  estj.iihjt.es  for  P . 

(2)  Statistically  improved  results  may  also  obviously  be 
obtained  by  finding  measures  which  are  more  highly  correlated  with  sub- 
jective results.  One  approach  is  to  simultaneously  attsnpt  to  better 
understand  the  parametric  factors  involved  in  human  quality  acceptance, 
as  has  been  sttempted  in  the  "QUART"  and  "DAM"  tssts,  and  to  develop 
objective  measures  which  are  highly  correlated  with  the  iagportant 
parametric  subjective  measures. 

(3)  Improvements  are  possible  in  the  particular  objective  measures 
used  in  the  correlation  studies.  For  example,  Makhoul  [2.13]  suggests 
ssvarsl  forms  of  frequency  weighting  in  LPC  spectral  distance  swasuras 

which  sdght  be  used  to  Imorove  subjective-objective  correlation. 
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CHAPTER  i 


SUBJECTIVE  PREDICTION  OP  USER  PREPBRBNCB 

3.1  Introduction 

A crucial  Issue  In  the  design  and  Inpleaentatlon  of  a digital 
voice  comunlcatlon  syste*  Is  the  prediction  of  user  acceptability. 

Even  if  the  many  other  systen  design  criteria  are  resolved  and  a good 
engineering  solution  found,  the  systen  will  foil  unless  people  use  It. 
People  will  use  it  only  If  they  find  It  highly  acceptable  on  the  basis 
cf  their  current  teleconsunlcaclons  alternatives. 

Speech  testing  has  been  categorised  as  quality  testing  or 
intelligibility  testing.  The  tern  preference  testing  or  acceptability 
testing  really  supercedes  both  terns,  not  as  a replacenent  for  either, 
but  as  a combination  of  the  essential  features  of  each.  That  Is, 
preference  is  assumed  to  be  based  on  a sufficient  combination  of  quality 
and  intelligibility  to  determine  relative  user  acceptability.  It  must 
be  recognised  here  that  100%  intelligibility  may  be  yet  cf  unacceptable 
quality  and  hence  of  low  preference,  just  as  pleasant  but  unintelligible 
speech  Is  cf  low  preference. 

Just  as  with  quality  and  Intelligibility  testing,  preference 
testing  can  be  livleasnted  with  a wide  variety  of  strategies  or 
nethodologle^.  The  teat  nay  be  subjective,  objective,  parametric, 
Isometric,  based  on  absolute  or  relative  scales,  with  an  Infinite 
variety  of  organisations.  Fortunately,  much  work  has  been  done  in  the 
testing  of  speech,  so  that  we  do  not  need  to  begin  from  scratch. 

In  this  chapter  we  will  consider  subjective  testing.  Objective 
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testing,  another  phase  of  this  effort,  is  considered  in  Chapter  2. 

3.2  Subjective  Testing  Philosophies 

Subjective  testing  procedures  are  baaed  on  drawing  from  a 
population  of  potential  system  users,  i.e.  subjects  their  reaction  to 
the  speech  produced  by  a digital  speech  transmission  system.  These 
reactions  must  be  quantified  somehow  and  are  then  averaged,  or  processed, 
according  to  established  statistical  principles  to  arrive  at  a measure 
of  user  acceptance  or  preference.  The  basic  testing  philosophies  can 
be  listed  as  follows: 

Iso- Preference  Testing  - involves  the  use  of  a known,  agreed 
upon  reference  signal  condition  for  use  as  a con^arison  in  judging  an 
unknown.  The  agreed  upon  conditioning  must  be  parameterized  so  that 
the  unknown  or  test  signal  can  be  found  equally  acceptable  to  an 
adjustment  of  the  peuraroeter  set.  This  procedure  then  yields  the 
judgement  that  a given  signal  is  as  acceptable  as  some  reference 
condition. 

Relative  Preference  Testing  - involves  comparisons,  done  inde- 
pendently, with  each  of  several  reference  conditions.  The  reference 
conditions  are  used  to  establish  a scale  of  preference,  and  an  unknown 
signal  can  then  be  ranked  on  this  scale.  The  subjective  scaile  of  the 
references  must  be  agreed  upon  a priori. 

Absolute  Preference  Testing  - methods  require  the  subjects 
performing  the  test  to  give  an  absolute  numerical  evaluation  to  the 
properties  described  in  the  test  format.  Properties  tested  can  be 
selected  to  describe  various  features  of  interest. 
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Isometric  Testing  for  user  preference  calls  for  a direct  evalna' 
tion  of  preference  from  the  test  subjects.  Each  subject  makes  his 
evaluation  against  the  background  of  his  total  experience  and  personal 
biases,  and  including  any  local  or  instantaneous  bias  with  fatigue  or 
irritability  effects  built  into  his  response. 

Parametric  Testing  asks  the  test  subject  to  make  judgements  with 
respect  to  specific  features  of  the  speech  signal  under  consideration. 

The  test  format  has  then  the  flexibility  of  later  weightings  of  feature 
judgements  to  achieve  a measure  of  acceptability  which  is  more  independent 
of  the  individual  subject's  biases.  The  appropriate  weightings  must  be 
agreed  upon  in  the  final  resolution  of  test  data  however. 


The  most  recent  application  of  these  philosophies  has  res  Ited 
in  the  FARM  test  and  the  QUART  test  13.1]  and  more  recently  in  the  DAf 
test  [3.2]. 

In  the  FARM  test  (Paired  Acceptability  Rating  Method)  an  iso- 
metric approach  is  used.  However,  since  systems  being  tested  are 
presented  to  the  subjects  in  a carefully  chosen  ordering,  paired 
con^arlsons  can  be  abstracted  from  the  test  results  or  on  a posteriori 
basis.  To  reduc*!  the  effects  of  extremes  of  responses  typical  in 
isometric  testing,  the  listerns  are  asked  to  judge  two  reference  or 
anchor  conditions,  one  "good"  and  one  "bad"  anchor.  Anchor  responses 
are  then  used  to  normalize  other  responses  within  and  across  listeners. 
Details  of  the  tenting  organization  and  exhaustive  analysis  of  results 
are  found  in  [3.1]. 

In  the  QUART  test  (Quality  Ac  stance  Rating  Test)  the  parametric 
philosophy  is  followed,  with  an  isometric  measure  of  overall  acceptability 
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included  as  well.  The  listener  is  asked  to  score  each  system  under 
test  with  respect  to  a family  of  features  aivl  *-0  give  his  overall  reac- 
tion. Extensive  analysis  of  this  approach  is  also  well  documented  (3.1]. 

An  outorowth  of  the  background  of  subjective  testing  of  speech 
in  general  and  of  experience  with  FARM  and  QUART  in  particular,  after 
substantial  further  requirement  in  the  choice  of  a family  of  features  to 
use  in  direct  response  solicitation,  is  the  DAM  test  (Diagnostic 
Acceptability  Measure). 

The  DAM  test  acquires  ratings  on  perceptual  features  which  have 
been  selected  after  extensive  experience  with  QUART  as  those  features 
closely  correlated  with  overall  acceptability,  nearly  orthogonal  to  each 
other,  and  directly  relateu  to  specific  system  functions  or  to  system 
operating  environment  conditions.  Jn  addition,  the  feature  set  thus 
extracted  is  small  enough  to  a] low  efficient  and  reasonable  subjective 
testing  to  be  acconylished.  The  DAM  test  is  still  evolving,  but  is 
nearing  a final  form.  Although  it  is  not  yet  documented  in  the  litera- 
ture, the  test  has  been  the  subject  of  substantial  interaction  between 
the  speech  research  group  at  Georgia  Tech  aiid  the  group  at  Dynastat. 
These  discussions  have  been  conducted  In  visits  by  A.  M.  Bush  and  T.  P. 
Barnwell  to  Dynastat  and  by  W.  D.  Voiers  to  Georgia  Tech.  A detailed 
description  of  the  DAM  test  is  included  as  Appendix  A of  t.ii.o  - eport. 

3. 3 Statistical  Testing  Procedures 

In  subjective  testing,  as  mentioned  earlier,  an  essential  aspect 
of  the  test  implementation  is  the  statistical  processing  of  the  data, 
i.e.  responses  from  listeners  or  subjects,  to  obtain  an  average  rating 
cf  the  system  or  system  feature  under  test.  Even  though  the  field  of 
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statistics  is  well  documented i both  in  the  scientific  literature  and  in 


textbook  and  reference  book  formats,  it  is  our  feeling  that  son«  expo- 
sition here  may  be  worthwhile.  Our  point  of  view  (necessarilyl ) is 

t 

that  of  the  comnunications  engineer  with  a background  in  probability, 
random  variables,  and  stochastic  processes,  who  feels  he  should  there- 
fore know  all  about  statistics  until  he  reads  a little  in  the  area. 

In  order  to  apply  statistics  to  the  results  of  subjective  testing, 
one  roust  eittier  base  the  statistics  on  assusqptions  regarding  the  under- 
lying distributions  of  the  individual  listener  responses,  the  parametric 
approach,  or  assume  that  these  underlying  distributons  are  unknown  and 
work  within,  for  exan^le,  ranlcing  statistics,  the  nonparametric  approach. 
The  parametric  approach  is  treated  from  a theoretical  approach  in  many 
places:  our  favorites  are  Wilks  [3.3  ),  and  Craroer  [ 3.4].  The  non- 

i 

i 

parametric  approach  is  also  extensively  treated,  but  our  favorite  here 
is  Hajek  ( 3.5].  For  applications  with  a minimum  of  theory,  a good 
reference  among  a great  many  possible  choices  is  Winer  [ 3.6  1 or  Siegel 
' 13.7]  for  parametric  or  nonparametric  tests,  respectively. 

In  the  parametric  approach,  the  most  common  assumption  regarding 
the  distribution  of  the  listener  responses  is  that  they  are  all 
Gaussian.  Hypotheses  with  respect  to  comnon  means  and/or  variances 
under  test  conditions  can  then  be  set  up  and  inferences  drawn  by 

£ con^arisons  with  standardized  tables. 

I 3.3.1  Distributions 

i 

{ The  key  disti'ibutlons  are  summarized  below  for  convenience. 

4 

r 

r ■ 
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Chi  Square 

Iret  be  independent,  identically  distributed 

Gaussian  random  veuriables,  each  with  zero  mean  and  unit  variance.  Then 

X*  “ I x'  (3.3.1) 

i«=l 

is  a new  random  variable,  with  a distribution  called  Chi-square  with  n 
degrees  of  freedom.  The  probability  density  function  is  given  by 
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X ^ 0 
X < 0 


(3.3.2) 


F-Distribution 


Let  and  be  n+m  independent,  Identically 

distributed  Gaussian  random  variables  each  with  zero  mean  and  unit 
veuriance.  Then  the  ratio 
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(3.3.3) 


is  a random  variable  with  a distribution  called  the  F-distrlbution,  with 
pareuDcters  m and  n.  The  probability  density  function  is 
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Student's  Distribution 

Let  Xq.Xj^,  . . • ,X^  be  independent  identically  distributed  Gaussian 
random  variables  each  with  zero  mean  and  unit  variance.  Let 


(3.3.5) 


Then  t is  a random  variable  which  has  a distribution  called  the  Student's 
distribution  with  parameter  n.  The  probability  density  function  is 


f^.(x) 
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(3.3.6) 


Studentized  Range  Statistic 

Let  X^/...,X^  be  independent  identically  dlstrDmited  Gaussian 
random  variables  each  with  zero  mean  and  unit  variance.  Define  a 
random  variable  Z as 
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Z * Bax(x. ) - min(x. ) 
i ^ i ^ 


(3.3.7) 


as  shown  in  Figure  3.1.  The  probability  density  function  of  Z is 


f )c_2 

■k()t~i)  (Fjj(U-Pjf(?-x))  fjj(C)fj^(S-x)dx  X 2 0 


f2(x) 


L ° 


X < 0 


(3.3.8) 


where  F (•)  is  the  Gaussian  cumulative  distribution  function  and  f (*) 

A X 

is  the  Gaussian  probability  density  function,  both  for  zero  mean,  unit 
variance  Gaussian  random  variables.  This  function  is  not  availedale  in 
closed  form  unless  k*2.  Some  points  of  the  cumulative  distribution 
function  for  Z have  been  tabulated.  See  for  exeunple  the  tables  of 
Winer  [ 3.6].  For  a derivation  of  (3.3.8),  see  Appendix  B of  this  report. 


3.3.2  Estimation 

We  consider  now  some  commonly  used  estimates  of  statistical 
parameters. 


Let  X^,...,X^  be  Independent  identically  distributed  random 

2 

variables  each  with  mean  p and  variance  o . Then 


1 

X - — 


- I \ 


(3.3.9) 


is  called  the  seuople  mean.  It  Is  an  unbiased  estimate  of  the  mean  of 
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the  ' a I 


Elxl  - v (3.3.10) 

_ 2 

Var(x)  - — (3.3.11) 

n 


Variance 

For  X^,...,X^  independent  identically  distributed  random 

2 

variables > each  with  mean  <i  and  variance  o , t)>e  sample  variance 

2 1 ” - 2 

3 - (X  - X)  (3.3.12) 

i-1 

is  an  unbiased  estimate  of  the  variance  of  the  x^'s,  with 


El»^l  - 0^  (3.3.13) 

Var[s^)  - ^(u  - o^)  (3.3.14) 

n 4 n -1 


where  u,  denotes  the  fourth  central  moment.  If  the  x 'a  are  Gaussian 
«•  1 

— 2 

as  well,  then  x and  s are  best  avoan  square  estimates  and  are  independent 
[ randoat  variables.  Also,  in  this  case. 


I's 


(3. 3. IS) 


is  a random  variable  with  the  student's  distribution  with  (n-l)  degrees 
of  ireedum. 


3,3.3  Analysis  of  FARM  Data 

As  an  axan^le  of  the  application  of  the  above  results,  let  us 
consider  the  problem  of  analysis  of  the  FARM  data.  Let 

« the  response  of  listener  i to  syatem  J 
on  the  presentation 

For  a particular  PARR  module  of  data,  we  have 

1 ^ 1 ^ L =>  the  number  of  listeners  in  the  module 

1 b j s M « the  number  of  systems  in  the  module 

1 ^ k s lOS  > T - the  number  of  times  a system  is 

presented  in  a module,  where  S 
is  the  number  of  speakers  in  the 
module. 

For  example,  L»10,  M«5  including  anchors,  S»3,  T*30  might  be  a set  of 
parameters,  with  1800  total  responses  in  the  module. 

Let 


(3.3.16) 
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error 


M L T 


S7i:^iT<  n I I - K'>  ' 

M(LT-l)  ijk  D 


(3.3. 24) 


Then*  combining  results,  we  have 


M-1  M(l.T-l) 

o + — ..  _ . — o ■ 


total  MLT-1  sys  MLT-1  error 


(3.  3. 25) 


Now,  if  - 0^  , that  is,  if  the  different  systems  themselves 

sys  error 

contribute  no  systematic  differences  to  the  variance,  then 


0 


2 

total 


0^  ♦ 0^ 

ays  error 


(J.  3. 26) 


The  F-test  is  used  to  test  the  hypothesis  that  , by  forming 

the  ratio  of  these  variables,  assuming  that  the  Gaussian  assumptions  hold, 
and  utilizing  tabulations  of  the  cumulative  distribution  of  the  F 
variable  under  the  hypothesis.  If  the  ratio  is  outside  predetermined 
Jx>unds,  the  test  is  said  to  hold,  that  is,  the  two  variances  are  not 
the  same.  Otherwise,  there  is  no  conclusion.  From  the  point  of  view 
of  statistical  hypothesis  testing,  we  tost  the  hypothesis  {systems 
contribute  no  systematic  difference).  If  F is  too  large,  we  reject  the 
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hypothesis.  This  amounts  to  considering  the  hypothesis  against  a 
specified  false  alarm  probability,  and  not  giving  any  other  measure  of 
performance . 

For  a comparison  between  pairs  of  means,  one  can  use  the  studenti- 
^■>d  range  statistic  as 


^a.R, f 


(3.3.25) 


where  a is  the  desired  quantile  point  of  the  cumulative  distribution 
of  the  statistic,  R*(j)- ( j*  )'*-l,  2SRSM,  is  the  number  of  steps  between 
the  Rj's  being  compared  when  all  the  R^^'s  are  ranX  ordered,  and 
f«M(LT-l)  ■ degrees  of  freedom  of  When  this  test  is  organized 

in  matrix  form  to  facilitate  the  comparison  of  all  means  for  significance 
of  differences  between  pairs  of  means  to  level  a of  false  alarm,  the 
procedure  is  called  the  Newman-Keuls  test.  (See  Winer  (3.6  ) pp.  80-81). 

3.3.4  Nonparametric  Tests 

In  nonparametric  testing,  one  declines  to  assume  that  the  under- 
lying statistics  are  Gaussian.  Then  one  ranlts  the  responses  corres- 
ponding to  their  relative  magnitude  either  signed  or  unsigned.  If  the 
conditions  hypothesized  give  no  systeiaatlc  differences  in  responses,  the 
rankings  will  be  purely  random,  resulting  in  statistics  which  for  two 
conditions  may  be  derived  fairly  easily.  Comnon  two  dimensional  non- 
parametric tests  result! ng  from  various  ranking  procedures  are  the 
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Wilcoxon  t«8t,  the  N«dian  t«at,  th*  Van  d«r  Warden  test,  end  the 
Ko  101090  rov^  Sod  rnov  teat.  He  jack  ( ] deacribea  each  of  these  teats 

end  gives  underlying  atetiatlca  for  which  each  test  is  nost  powerful. 

Unfortunately,  no  uniformly  mo-^t  powerful  teat  exiata.  In  situations 
where  underlying  distributions  may  reasonably  be  aasuined  to  bo  Gausaian,  | 

a parametric  test  will  in  general  be  beat. 

I 

Nonparanetric  teats  comparing  more  than  two  conditions  are  more 
difficult  to  compose  than  the  comparisons,  of  pairs  of  conditions  as 
all  the  rank  statistics  are  in  the  higher  order  case  derived  from 
multinomial  as  opposed  to  binomial  type  distributions.  Although  some 
references  are  made  to  such  proceudres  in  I.ehn«in  | 3.B  ),  e.g.  the 
Kruskal-Wallis  tost,  no  convenient  generally  accepted  multidimensional 
nonparametric  tests  were  found. 


3.4  Conclusions  and  Recomsvendations 

The  following  conclusions  regarding  subjective  prediction  of 
user  preference  are  drawn  priskarily  on  the  basis  of  data  available 
from  the  analysis  of  the  results  of  the  FARM  and  QUART  tests  I 3,1 J. 
from  the  discussions  at  Georgia  Tech  and  at  Dynastat  with  W.  D,  voiers, 
and  from  the  initial  results  of  the  0AM  test. 


i 

k 

f 


I: 
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3.4.1  Isometric  Tests 

In  isometric  tests  such  as  FARM,  the  absolute  rankings  of  system 
conditions  by  individual  listeners  will  have  a high  variance  due  to 
individual  listenar  idiosyncrasies  and  Intralistenors  variability,  in 
addition  to  interliaterner  variability.  Although  these  effects  can  be 


89 


balanced  out  by  extremely  careful  post-teat  processing  of  responses  to 
establish  consBun  origins  and  scales  within  and  across  listeners.  Such 
processing  is,  inevitably,  subject  to  some  criticism,  as  any  smoothing  of 
the  data  will  also  introduce  some  distortion  of  one  Kind  as  it  reduces 
other  effects,  .smoothing,  centering,  and  scaling  was  accomplished  in 
the  PARM  tests  based  on  the  ratings  and  relative  ratings  of  the  anchors. 
Although  more  efficient  anchoring  and  normalisation  procedures  can 
clearly  be  devised,  such  tests  will  always  suffer  from  high  variability 
and  hence  require  large  groups  of  listeners  and  many  trials  and  will 
always  be  subject  to  criticism  due  to  post  tost  normalisation  procedures. 

3.4.2  Tests  of  Features 

In  order  to  devise  an  effective,  efficient  and  reliable  subjective 
test,  it  is  necessary  to  narrow  the  scope  of  the  question  asked  the 
system.  That  is,  a more  specific  response  than  "Do  you  like  this?"  must 
be  solicited.  If  the  features  of  the  speech  which  are  perceptually  most 
iiH)ortant  in  determining  the  overall  user  acceptability  can  be  identified 
and  quantified,  than  one  can  construct  an  .-acceptability  rating  with  less 
variability  within  and  across  listeners. 

This  then  becomes  a problem  of  feature  extraction.  “Two  fronts  or 
approaches  to  this  problem  can  be  found:  (a)  List  all  the  conceivable 
descriptions  of  features.  Test.  Analyze  the  data  with  correlation 
analysis  and  try  to  find  the  features  which  are  important  empirically. 

(b)  Based  on  extensive  experience  with  various  systems,  select  the  most 
typical  types  of  noises  and  degradations.  Try  to  solicit  responses 
along  these  particular  features.  Include  effects  of  the  environment  such 
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as  background  noisos.  Kuature  selection  using  methoi,!  (a)  was  ustsl  in 
QUART.  Subsequent  refinement  using  the  ideas  of  (b)  as  well  have  led 
to  the  parameter  sets  of  DAM.  It  is  our  judgnn'nt,  based  on  the  result-^ 
of  DAM,  that  the  best  available  subjective  preference  testing  procedure 
available  now  is  DAM.  It  should  be  pointed  out  that  until  the  extensive, 
expensive,  detailed  test  results  of  FARM  and  QUART  it  was  not  jx-issible 
to  draw  this  conclusion:  however,  the  detailed  agreement  of  FARM  and 
QUART,  and  the  subsequent  development  of  DAM  leave  no  other  conclusion. 

3.4.3  Implementation  of  Subjective  Tests 

The  monumental  and  time  consuming  tasks  of  con>iucting  a subjective 
listening  test  can  effectively  be  implemented  for  improved  speed  and 
efficiency  by  using  an  interactive  con\{^uter  to  con' lol  the  test,  collect 
the  data,  and  subsequently  to  analyte  the  test  data. 

3.4.4  Site  of  the  Test 

The  numbers  of  listeners  which  must  be  used  in  a subjectiiu' 
testing  procedure  can  be  determined  only  after  sufficient  data  is 
accumulated  with  a particular  tes^  methodology  or  algorithm  to  permit 
good  estimation  of  the  error  variances.  Then  the  number  of  resjxinses 
must  be  selected  to  give  an  adequate  resolution  of  the  data  to  separate 
systems  under  test.  Note  that  the  required  resolution  also  will  depend 
on  how  different  the  systems  to  be  resolved  are  on  the  scale  of  Interest. 

3.4.5  Speaker  Selection 

The  number  of  speakers  has  been  found  in  QUART  and  FARM  to  be 
less  significant  than  previously  thought,  from  the  point  of  view  of 
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statistical  resolving  power.  However*  froa  the  point  of  view  of  system 
design,  it  is  clear  that  some  systeata  will  be  highly  biased  toward  low 
pitched  speech  or  moderately  pitched  speech,  and  perform  quite  poorly 
on  high  pitched  speech  or  vice*versa.  Hence,  it  i.s  considered  essential 
to  use  at  least  two,  preferably  three,  speakers  chosen  to  cover  the 
expected  range  of  ■^itches.  This  strategy  will  at  least  isolate  quickly 
systems  which  will  not,  for  example,  respond  to  a female  voice. 

3.4.6  Overall  Recommend  ttions  for  Subjective  Tests 

The  overall  recommendation  to  come  from  this  examination  of 
subjective  tests  and  test  facilities  is  the  development  of  an  interactive 
computer  based  hardware  facility  for  conducting  a refined  version  of  the 
DAM  test, 
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4.  A SUBJECTIVE  COMMUNICABILITY  TEST 


4»1  Introduction 

When  judging  the  performance  of  highly  Intelligible  speech  c^- 
munications  systema,  one  approach  is  to  apply  an  Isometric  subjective 
user  acceptability  test,  such  as  the  PAHM.  The  hypothesis  in  such 
tests  is  that  subjects  can  judge,  from  listening  to  speech  segments 
played  through  the  systems  being  tested,  the  overall  expected  accepta- 
bility of  a system.  The  problem  with  these  tests  is  that  the  subjects' 
responses  represent  a noisy  measure  of  the  actual  acceptability  of  a 
system.  In  this  context,  the  "ACCEPTABILITy"  of  a system  is  defined  as 

! the  level  to  which  complex  cossBunication  tasks  can  be  accomplished 

i 

1 , 

I while  using  the  system. 

A model  which  states  the  problem  more  clearly  is  one  which 
postulates  a fixed  cognitive  resource  available  to  a user  of  a cososunl- 
cation  system.  As  was  discussed  in  Chapter  2,  due  to  the  multiplicity 
of  acoustic  cues  for  segmental  and  supersegmental  ^**fure8  in  speech, 
and  due  to  a listener's  imnenee  knowledge  of  the  phonsaics.  syntactics, 
and  semantics  of  his  language,  a listener  may  well  be  able  to  under- 
stand speech  which  is  very  distorted.  The  problem  is  that  to  do  so,  he 
must  utilise  a large  portion  at  his  cognitive  resource  to  just  under- 
stsmding  what  la  being  said.  For  a low  quality  system,  therefore,  this 
leaves  him  relatively  less  cognitive  resource  to  apply  to  the  communi- 
cation task,  making  the  ccomixniaatlon  task  more  difficult. 

I 
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The  definition  of  a ''COMMUNICABILITY  TEST,"  as  used  in  this 


chapter,  is  any  test  which  tzys  to  measxire  a user  performance  on  a 
conimnication  task  while  using  a comaunication  system.  The  idea  is  to 
design  tests  in  wliich  users  are  not  asked  to  rate  systems,  but  rather 
are  asked  to  perform  some  task  in  which  the  subjects'  perfomumce  may 
be  measured  objectively*  In  order  to  be  an  acceptable  conmunicability 
test,  therefore,  the  test  must  meet  several  requirements.  First,  the 
communication  task  must  be  difficult  enough  so  that  a subject  is  using 
most  of  his  cognitive  resource  in  performing  the  task  even  with  no 
system  distortion.  Second,  a subject's  performance  on  the  task  must  be 
easy  to  measure.  Third,  the  test  must  be  inexpensive  to  administer 
because  it  has  enough  inherent  resolving  po%Msr  to  differentiate  among 
the  communications  systems  without  eccessive  subject  costs.  Last,  the 
test  should  not  require  the  actual  use  of  a communication  system  in  the 
test,  BO  that  simulated  systems  may  also  be  tested. 

This  chapter  describes  the  design  emd  testing  of  one  such 
communicability  test.  Section  4.2  describes  the  design  of  the  automated 
subjective  data  acquisition  system  used  to  administer  the  test.  Section 
4.3  describes  the  details  of  the  test  Itself.  Section  4.4  describes  the 
data  analysis  done  in  the  test.  Section  4.5  describes  the  test  results. 
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4 . 2 An  Automated  Speech  Sublective  Quality  Testing  Facility 

One  of  the  greatest  sources  of  expense  In  performing  subjective 
speech  quality  tests  Is  the  large  amount  of  manual  data  handling  re- 
quired to  prepare  the  test  results  for  computer  analysis.  In  order 
to  reduce  this  source  of  expense,  an  automated  subjective  data  acquisi- 
tion system  was  developed  at  Georgia  Tech. 

A diagram  of  the  hardware  portion  of  the  subjective  data 
acquisition  system  is  shown  in  Figure  4.1.  The  system  consists  of  six 
"STATIONS,"  each  of  which  has  an  earphone  control  console,  a CRT,  and 
a total  of  16  buttons,  fifteen  "DATA"  buttons  and  one  ‘'CONTROL"  button. 
The  CRT  is  used  for  transmitting  alphanumeric  data  to  the  subjects 
through  the  computer's  D/A  Interfaces,  while  the  buttons  are  used  for 
collecting  subject  responses.  The  audio  for  the  system  is  supplied 
by  a Crown  800  analog  tape  recorder  which  is  digitally  controlled,  in 
general,  1 kHc  tones  are  placed  one  track  of  the  analog  tape  to  mark 
the  endo  of  test  sequences.  These  tones  can  be  detected  by  the  compu- 
ter through  a phase  lock  loop  detector,  and  are  used  to  accurately 
position  the  recorder. 

In  order  to  administer  the  test  and  collect  the  data,  a multi- 
task interpretive  test  control  program,  called  "QUALGOL,"  was  written. 
The  QUALGOL  language  is  summarised  in  Table  4.1,  and  has  all  the 
necessary  elements  (constants,  variables,  labels,  loop  control, 
arithmetics,  etc.)  for  a simple  computer  language.  Using  the  QUmLGOL 
language,  an  experimenter  can  easily  "PROGRAH,”  a large  class  of  sub- 
jective teats  on  the  quality  testing  facility.  A program  used  for 
administering  some  of  the  tests  performed  during  this  study  is  given 
In  Figure  4.2. 
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HARDWARE  FOR  QUALITY  TESTING 


SIX  STATION  gUALir/  FACILITY 


15  BUTTONS 


QUALITY  STATION 


Figure  4.1 
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TABLE  4.1 
QUALGOL  LANGUAGE 


cowvafT;o*is.. 

V - VARIABLE 

N - CONSTANT 

ViiRlABLBSi 

A-Z 

CONHANDSi 

C 

CROWN 

C(V) 

RECEIVE  FROM  CROWN 

1 » TONE 

0 - NO  TONE 

C(N) 

SEND  TO  CROWN 

1 FAST  FORWARD 

2 STOP 

3 PLAY 

4 RECORD 

5 REWIND 

0.6,7  NO-OP 

D 

DELAY 

D(N) 

DELAY  N(.l  SBC)  UNITS 

DI 

DISPLAY 

D(N) 

DISPUY  MESSAGE  N 

B 

END 

G 

GET  RESPONSES 
G(V) 

GET  V RESPONSES  DECREMENT 
V TO  ZERO 

I 

INCREMENT 

I (V) 

INCREMENT  V BY  ONE 

J 

JUMP 

J(V, LABEL) 

.7(6, LABEL) 

JUMP  TO  LABEL  IF  V-0 

JUMP  TO  LABEL 

M 

MESSAGE 
(M<N,". . .") 

DEFINE  MESSAGE 

P 

PRINT 

P(V) 

PRINT  V 

S 

SET 

S(V,N) 

SET  V TO  N 

T 

TRACE 

T 

TRACE  SWITCH 

M 

NAll 

NI(N) 

WAIT  N UNITS 
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(1, LISTEN®  TOeSAMPLE) 

M(2,  ) 

M(3,  MAKE®CHOICE  NOW) 
M(4,PLEASE6MAKE  choice®  NOW) 
M(5,NOW  STUPID) 

S(E,-100) 

C(3)W(2)C(0) 

LT  C(B) J(B,L1) J(@.L2) 

L2  C(2)W(2)C{0) 

LM  I(E)J(E.EN)OI(l) 

C(3)W(3)C{0) 

L3  C(B)J(B,L4)  J(®.L3) 

L4  C(B)  J(B,M)J(@.L5) 

L5  C(2)W(2)C(0) 

DI(2)W(10) 

DI{3)S(C,1)G(C)W(30) 

J(C,LM)DI(4)S{D,-10) 

L7  W(10) J(C,LM) 1(D) (D,L8)J(®,L7) 
L8  S(D,-10)DI(S) 

L9  J{C,LM)W(10)1(D)J(D,LM)J®,L9) 
EN  END 


FIGURE  4.2  AN  EXAMPLE  "QUALCPL"  PROGRAM 
USED  TO  ADMINISTER  IHE  COMMUNlCABILnY  TESTS 
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4.3  The  Experimental  Format 

The  connunicaLllity  test  format  chosen  for  this  study  was  a 
"Multiple  Digit  Recall"  test  similar  to  that  studied  by  Naghtani  at 
Bell  Labs.  In  this  format,  sequences  of  random  digits  are  first  re- 
corded by  trained  speakers,  and  then  these  utterances  are  played 
through  various  distorting  systems.  The  resulting  sequences  are  then 
played  to  subjects  whose  task  is  to  "RECALL"  the  digits  after  a short 
1 sec.)  wait.  This  test  format  meets  all  the  basic  criteria  set 
forth  in  the  introduction,  since  the  task  does  not  require  a quality 

Judgment  on  the  part  of  the  subjects,  the  test  is  simple  to  administer, 
and  the  test  does  not  require  the  communication  system  being  tested  to 
be  present. 

The  purpose  of  the  study  reported  here  was  to  study  the  useful- 
ness of  this  test  format  for  evaluating  communication  systems  both 
from  a resolution  and  cost  point  of  view.  It  should  be  noted  that  this 
study  was  a relatively  small  portion  of  the  total  effort,  and  the 
results  obtained  should  be  considered  preliminary  in  nature.  The 
tests  were  performed  as  follows.  First,  strings  of  random  digits 
were  generated  by  the  computer  by  a progr^un  which  rejected  all  strings 
which  had  double  digits,  had  more  than  two  digits  in  ascending  or 

descending  sequence,  or  had  more  than  two  digits  in  ascending  or  descen- 
ding alterate  (2-4-6,  etc.)  sequence.  Forty  random  sequences  were 
generated  in  6, 7, 8, 9,  and  10  digit  lengths.  Second,  the  digit 
strings  were  read  into  a high  quality  tape  recording  system  by  a 
trained  announcer  from  the  student  broadcast  radio  station.  The  digits  were 
read  "as  if  there  were  a list,"  so  that  no  internal  groupings  were 
imposed  on  the  numbers.  Third,  the  number  strings  were  low  pass 


i. 

i 

i 
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filtered  to  3.2  kHt  and  digitised  at  8 kHs  to  12  bits  resolution. 

Th«  results  were  stored  on  three  2400  ft.  800  BPl,  9 track  digital 
tapes . 

In  all,  four  sets  of  tests  were  performed.  In  the  first 
"preliminary"  test,  undi.storted  data  was  played  to  subjects  to  try 
to  determine  an  appropriate  number  of  digits  for  the  final  tests. 

In  all,  the  subjects  listened  to  200  sequences  consisting  of  40  each  of 
6, 7, a, 9,  and  10  digit  strings.  As  a result  of  this  teat,  digit  sequence 
lengths  of  7 and  8 were  chosen. 

In  the  remaining  three  tests,  distortions  were  applied  to  the 
number  strings,  and  these  were  played  to  subjects.  Each  of  these 
three  tests  tested  the  undistorted  strings  against  three  levels  of 
easily  perceivable  distortions.  In  the  first  test,  the  distortions 
were  white  Gaussian  noise  at  a SNR  of  10  db,  ft  db  and  5 db.  In  the 
second  tost,  the  distortions  were  low  pass  filtering  at  2.4  KH*  cut- 
off frequency,  l.ft  kHr  cutoff  frequency,  and  1.2  kHs  cutoff  frequency. 
In  the  third  test,  the  disvoctions  were  ADPCM  waveform  coder  distor- 
tions at  24  kBPs,  lb  kBPs,  and  B kBPs.  Each  set  of  distortions 
was  played  to  18  subjects  for  a total  of  lBx3x2xSO»b400  responses. 

4.4  The  Data  Analysis 

The  data  analysis  was  done  in  three  stages.  First,  the  data  is 
entered  into  a general  data  base.  Second,  a program  called  "VERIFY* 
examines  the  numbers  for  cases  where  the  number  of  errors  is  greater 
than  three,  or  where  the  errors  meet  a set  of  special  conditions 
(reversals,  dropped  numbers,  etc.).  In  each  case,  the  experimenter 
can  choose  to  oisit  the  subject  data.  Third,  a program  called  "SCORE" 
allows  the  analysis  of  the  data  base  for  the  means  and  variances 


104 


nACcttary  to  use  standard  SCudent's-c  snalysls  and  analysis  of 
variance  techniques,  and  allows  the  calculation  of  extensive 
correlation  sets. 

In  all,  three  types  of  scoring  procedures  were  applied  to  the 
data.  In  the  first  procedure,  each  response  string  was  scored  to  be 
either  correct  or  not  correct,  and  no  note  was  made  of  the  number  of 
errors  in  the  string.  Ihe  score  statistic  for  this  method  was  the 
percentage  of  incorrect  strings  for  each  subject,  fur  each  distortion, 
and  for  each  test. 

In  the  second  scoring  procedure,  each  response  btring  was  matched 
to  the  correct  string,  and  the  score  was  taken  to  be  the  total  number 
of  incorrect  digits.  In  this  scoring  procedure,  all  response  strings 
with  missing  digits  or  response  strings  with  the  wrong  number  of 
digits  were  given  a score  of  4. 

The  third  type  of  scoring  was  derived  by  classifying  the  types 
of  digit  errors  In  the  response  strings.  It  wns  found  that  the 
predominant  type  of  error  in  the  test  was  a two  digit  error  obtained 
from  interchanging  two  digits.  In  the  third  scoring  procedure,  such 
an  Inversion  would  be  considered  to  be  one  error  rather  than  two. 

Rules  were  compiled  to  handle  inversions  of  more  than  two  numbers  as 
such  cases  appeared  in  the  data. 

For  the  following  discussion,  each  scored  result  will  be  referred 
to  by  the  designation  where  t is  test  number  (t  * 1 for  the 

additive  noise  test,  t ~ 2 for  the  low  pass  filter  test,  and  t ■ 3 
for  L.ie  ADPCM  coding  test),  s is  the  subject  number  (18  per  test, 
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1 < s £ S,  where  S ■ 18),  d Is  the  distortion  level  (four  for  cech 
teat  • three  distortions  and  "clear"  1 < d s D,  where  D ■ 4) , and 
n is  the  number  of  results  per  subject  (1  < n < N,  where  N * 1 for 
the  first  scoring,  and  N • 10  for  the  last  two).  For  each  test, 
analysis  of  variance  was  used  to  determine  the  significance  of  the 
entire  test,  while  the  Student's  t statistic  was  used  to  determine 
statistical  significance  between  distortions.  In  each  test,  the 
first  10  responses  were  considered  to  be  "training"  responses,  and 
were  not  included  in  the  results.  Ihe  analysis  of  variance  was 
performed  by  calculating  the  F statstic  given  by 

r .2 

(4.4.1) 


ftt  V'  (*td  - 


^ 1 ...  - 5 

D(SN  • 1)  ^ s*  IB  ^^tsdm  td^ 


and  testing  for  significance  using  the  appropriate  F distribution, 
while  the  pairwise  significance  was  tested  by  calculating  the  t 
statistic 


t - 


"tdi 


t'-n.,  n.. 


(4.4.2) 


and  finding  the  significance  from  the  t distribution. 

4.5  The  Experimental  Results 

Table  4.5.1  shows  the  results  of  the  first  scoring  procedure  as 
applied  to  the  three  tests.  A summary  of  the  distortions  for  each  test 
is  given  in  Table  4.5.2.  The  overwhelming  point  is  that  there  are 
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7 

Digit 

Test 

DISTORTION  (t) 

AV. 

(1) 

(2) 

(3) 

.29 

(1) 

X 

1.86 

2.00 

NOISE 

.42 

(2) 

★ 

X 

.14 

TEST 

.43 

1 

(3) 

★ 

X 

.48 

1 

(4) 

irk 

.28 

(1) 

X 

1.29 

2.29 

LPF 

.37 

(2) 

X 

1.00 

TEST 

.44  1 

1 (3) 

★ 

X 

.45 

1 (4) 

irk 

1 

.29 

(1) 

K 

2.00 

3.14 

1 

1 

ADPCH 

.43 

(2) 

* 

X 

1.14 

TEST 

.51 

(3) 

X 

.60 

(4) 

ilr*f 

8 Digit  Test 
DISTORTION  (t) 


(4) 

AV. 

(1) 

(2) 

(3) 

(4) 

2.71 

.53 

(1) 

X 

.37 

.86 

1.60 

.86 

.56 

(2) 

X 

.49 

1.23 

.71 

.60 

(3) 

X 

.74 

X 

.66 

(4) 

X 

2.43 

.55 

(1) 

X 

1.36 

1.96 

2.22 

1.14 

.66 

(2) 

X 

.62 

.89 

.14 

.71 

(3) 

* 

X 

.24 

X 

.73 

(4) 

X 

4.43 

.56 

(1) 

X 1.36 

2.22 

3.09 

2.43 

.67 

(2) 

X 

.86 

1.73 

1.29 

.74 

(3) 

* 

X 

.86 

X 

.81 

(4) 

Mr 

X 

t LEVEL  FOR  SIGNIFICANCE  FOR  NO  REJECTED  DATA 


* ■ Significance  «t  ,05 
**  ■ Significance  at  .01 


TABLE  4.5.1  RESULTS  OF  UNSCREENED  FIRST  SCORING  TESTS 
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TEST 


DISTORTION 


(1) 

(2) 

(3) 

W 

ADDITIVE 

NOISE 

NONE 

lOdb  SNR 

8db  SNR 

5db  SNR 

LOW  PASS 

FILTER 

NONE 

2.4  kHz 

1.8  kHz 

1.2  kHz 

ADPCM 

NONE 

24  KBPS 

16  KBPS 

8 KBPS 

TABLE  4.5.2  DISTORTION  LEVELS  FOR  THE  TEST  DIGITS 
ON  THE  THREE  COMMUNICABILITY  TESTS 
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very  £ew  significant  results  using  this  scoring  scheme.  Ihe  major 
problem  here  turns  out  to  be  the  subject  variations.  Some  subjects 
are  so  "bad"  that  they  get  practically  no  strings  correct.  Others 
are  so  "good"  that  they  never  miss.  It  was  hence  decided  to  screen 
out  subjects  vhose  average  error  rate  was  outside  Che  range  .3  < error 
rate  < .7.  This  left  10  subjects  on  the  first  test.  9 on  the  second, 
and  10  on  the  third.  The  results  for  this  scoring  is  shown  in  Table 
A. 5. 3.  Clearly,  this  screening  .^proves  the  results,  with  a large 
number  of  results  significant  at  the  .01  level.  This  same  effect  was 
found  CO  hold  for  the  ocher  two  scoring  procedures. 

Tables  4.5.4  and  4.5.3  show  Che  results  from  the  second  and  third 
scoring  procedures.  In  these  tests  the  subjects  were  screened  exactly 
as  for  Che  first  scoring  procedure.  Several  results  are  clear  from 
these  two  tests.  First,  both  scoring  procedures  represent  a considerable 
improvement  over  the  first  procedure,  with  Che  third  procedure  having 
a slight  edge  in  significance.  Second,  the  noise  tests  seem  to  have 
less  overall  effect  (less  significance)  than  either  Che  low  pass 
filter  test,  or  the  ADPCM  test.  Third,  the  7 digit  test  seems  to  be 
generally  more  acceptable  than  the  8 digit  test  (higher  significance 
levels  for  the  same  number  of  subjects). 

4.6  Conclusions 

The  purpose  of  this  study  was  to  acertain  the  usefulness  and  cost 
of  the  digit  vecall  test  as  a coomunicability  test  for  speech  digiCication 
systems.  The  overall  results  must  be  stated  to  be  Chat; 

1.  For  the  rather  severe  variations  in  distortions  used  in  this 
test,  it  was  easily  possible  to  differentiate  between  systeme. 
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7 Digit  Test 
DISTORTION 


8 Digit  Teat 
DISTORnON 


NOISE 

TEST 


U»F 

TEST 


ADPOl 

TEST 


(1) 

(2) 

(3) 

(4) 

(1) 

(2) 

(3) 

(4) 

.29 

(1) 

X 

2.68 

3.35 

4.70 

.50 

(1) 

X 

1.49 

2.42 

2.80 

.41 

(2) 

X 

.67 

2.01 

.58 

(2) 

X 

.93 

1.30 

.44 

(3) 

X 

1.34 

.63 

(3) 

* 

X 

.37 

.50 

(4) 

* 

X 

.65 

(4) 

X 

(1) 

X 

2.01 

4.70 

6.71 

.51 

(1) 

X 

2.24 

4.10 

4.47 

.37 

(2) 

* 

X 

2.68 

4.70 

.63 

(2) 

X 

1.86 

2.24 

.49 

(3) 

** 

Irk 

X 

2.02 

.73 

(3) 

irk 

* 

X 

.37 

.58 

(4) 

itir 

trk 

* 

X 

.75 

(4) 

^rk 

X 

.28 

(1) 

X 

3.58  5.59 

8.05 

.54 

(1) 

X 

2.61 

4.29 

5.22 

.44 

(2) 

** 

X 2.01 

4.47 

.68 

(2) 

e 

X 

1.66 

2.61 

.53 

(3) 

itk 

* X 

2.46 

.77 

(3) 

irk 

* 

X 

.93 

.64 

<4) 

Irk 

krit  ★ 

X 

.82 

(4) 

** 

* 

X 

t LEVEL  FOR  SICMFICANCE  FOR  NO  REJECTED  DATA 

* Significance  at  .05 
**  Significance  at  .01 


TABLE  4.5.3  RESULTS  OF  SCREENED  FIRST  SCORING  TESTS 
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I 


7 Digit 
DISTORTION 


(1) 

(2) 

(3) 

1 

.62 

(1) 

X 

2.47 

3.22 

1 

NOISE 

TEST 

.72 

(2) 

Mr 

X 

2.75 

.81 

(3) 

** 

** 

X 

.93 

(4) 

Mk 

** 

M 

.60 

(1) 

X 

4.67 

6.32 

1 

LPC 

.77 

(2) 

1r* 

X 

1.65 

1 

TEST 

.83 

(3) 

** 

★ 

X 

.98 

(4) 

Mr 

♦* 

** 

i : 

1 

! 

.38 

(1) 

X 

4.39 

6.67 

I ■; 

ADPGM 

.74 

(2) 

★* 

X 

2.41 

TEST 

.83 

(3) 

** 

Mr 

X 

.90 

(4) 

Mr 

M 

Mr 

*■  LEVEL  FOR 

* SlgnlfLcanc*  « 
**  Significance  a 


8 Digit 
DISTORTION 


(4) 

(1) 

(2) 

(3) 

(4) 

8.51 

.84 

(1) 

X 

3.84 

5.11 

6.14 

6.04 

.99 

(?) 

Mr 

X 

1.28 

2.30 

3.30 

1.04 

(3) 

M 

X 

1.02 

X 

1.08 

(4) 

** 

M 

X 

10.44 

.821  (1) 

X 

5.37 

7.19 

8.55 

5.77 

1.03 

(2) 

M 

X 

1.79 

3.58 

4.12 

1.10 

(3) 

Mr 

* 

X 

1.79 

X 

1.17 

(4) 

** 

** 

* 

X 

8.79 

.83 

(1) 

X 

4.60 

6.65 

8.18 

4.39 

1.01 

(2) 

Mr 

X 

2.05 

3.58 

1.92 

1.09 

(3) 

** 

** 

X 

1.53 

X 

1.15 

(4) 

Mr 

it* 

* 

X 

SKKIFIGANCE  FOR  NO  REJECTED  DATA 

; .05 

.01 


TABLE  4.3.4  RESllLTS  OF  THE  SCREENED  TESTS  U8INO  TUB 

i i SEOOND  SCORING  METHOD 

» 

} ' ; 

i 

I ' 


r 


M 


m 


7 Digit 
DISTORTION 


(1) 

(2) 

<3) 

(4) 

^.53 

(1) 

X 

3.85 

7.42 

10.44 

j.67 

(2) 

** 

X 

3.57 

6.59 

.60 

(3) 

** 

X 

3.02 

.91 

(4) 

Ith 

X 

.51 

(1) 

X 

3.57 

7.69 

9.89 

.64 

(2) 

** 

X 

4.12 

6.32 

TtT 

(3) 

■trk 

-trk 

X 

2.20 

"IbF 

(4) 

Irk 

Irk 

** 

X 

8 Digit 
DISTORTION 


.63 

<1> 

(i) 

X 

<2) 

4.86 

O) 

7.67 

(4) 

9.46 

.82 

(2) 

•k* 

X 

2.81 

4.60 

.93 

(3) 

kk 

** 

X 

1.79 

l.OO 

(4) 

irk 

** 

* 

X 

.61 

(1) 

X 

5.63 

8.44 

10.74 

CD 

• 

(2) 

irk 

X 

2.81 

5.11 

.94 

(3) 

kk 

★★ 

X 

2.30 

1.03 

(4) 

kk 

kit 

X 

.52 

(1) 

X 

4.94 

7.69 

9.61 

.60 

(1) 

X 

5.63 

8.6!? 

10.74 

^70 

(2) 

kk 

X 

2.75 

4.67 

.82 

(2) 

** 

X 

3.07 

5.11 

.80 

(3) 

kk 

kk 

X 

1.96 

.94 

(3) 

kk 

** 

X 

2.05 

.87 

<4) 

kk 

kk 

* 

X 

1.02 

(4) 

kk 

kk 

** 

X 

t LEVEL  FOR  SIGSnEICANCE  FOR  NO  REJECTED  DATA 

* ■ Significance  at  .OS 
**  ■ Significance  at  .01 

TABLE  4.5.5  RESULTS  OF  SCREENED  TESTS  USING  THE  THIRD  SCORING  METHOD 
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2.  nie  cost  of  this  test  Is  quite  high  when  compered  to  other 
speech  quality  and  speech  Intelligibility  tests. 

3.  There  it  great  subject  variability,  indicating  that  results 
might  be  improved  substantially  by  using  a trained,  well 
documented  crew  of  listeners. 

4.  For  this  oartlcular  group  of  subjects,  7 digits  seemed  about 
right.  Clearly,  however,  for  some  7 was  too  many,  while  for 
others.  8 was  too  few. 

5.  ihe  teat  Is  a very  unpleasant  test  in  which  to  participate. 

6.  The  ability  of  digit  recall  tests  to  differentiate  between 
systems  which  are  closely  matched  for  performance  is  limited, 
and  would  require  considerable  cost. 

In  sumuary,  it  may  be  said  that,  even  though  this  type  of 
communicability  teat  can  be  argued  to  be  more  appropriate  than 
subjective  preference  testing,  and  even  though  it  is  possible,  as  shown 
in  this  study,  to  differentiate  among  distorting  systems,  still  the 
excessive  cost  of  communicability  testing  required  to  obtain  the 
desired  significance  levels  makes  these  tests  unattractive. 
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APPENDIX  A 

SPEECH  ACCEPTABIUTY  EVALUATION  AT  DYNASTAT; 
THE  DIAGNOSTIC  ACCEPTABILITY  MEASURE  (DAM) 
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SPEECH  ACCEPTABILITY  EVALUATION  AT  DYNASTAT: 
THE  DIAGNOSTIC  ACCEPTABILITY  MEASURE  (DAM) 

BACKGROUND 


Ic  is  a matter  of  common  observation  that  user  \toep- 
tance  of  voice  communications  equipment  depends  on  factors 
other  than  speech  intelligibility.  Althoujfh  a high  degree  of 
intelligibility  is  generally  a necessary  condition,  it  is  not 
a sufficient  condition  of  user  acceptance.  But  until  recently, 
no  generally  satisfactory  methods  of  evaluating  the  overall 
acceptability  or  ’’quality"  of  processed  or  transmitted  speech 
has  been  available.  Among  the  pr'=‘viously  available  methods, 
some  are  applicable  only  for  certain  types  of  spr.ech  signal 
degradation.  Others  are  of  limited  reliability.  Virtually 
none  permits  reliable  system  evaluation  in  absolute  terms  for 
the  diversity  of  processing  techniques  and  transmissions 
encountered  in  modem  digital  voice  communications. 

Under  contract  with  the  Defense  Communications  Agency, 
Dynastat  recently  undertook  to  fill  the  need  that  existed 
in  the  area  of  acceptability  evaluation.  The  results  of  this 
effort  Included  the  Paired  Acceptability  Rating  Method  (PARM) 
and  the  Quality  Acceptance  Rating  Test  (QUART) , both  of  which 
provide  improved  reliability  of  measurement  on  an  absolute  scale 
of  acceptability.  Having  met  the  Interim  needs  of  the  Narrow 
Band  Voice  Consortium,  they  also  served  as  valuable  research 
tools  to  clarify  a number  of  crucial  methodological  issues  and 
to  Indicate  possible  means  of  further  refining  the  technology 
of  speech  evaluation. 

Drawing  on  insights  gained  in  the  course  of  its  con- 
tractual activities  with  PARM  and  QUART,  Dynastat  continued  under 
its  own  auspices  to  further  advance  the  technology  of  communication 
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system  evaluation  from  the  standpoint  of  overall  speech  accept- 
ability. These  efforts  culminated  in  the  bia;;nostic  Accept- 
ability Measure  (DAM) . 

THE  DIAGNOSTIC  ACCEPTABILITY  MEASURE 

The  Diagnostic  Acceptability  Measure  combines  direct 
(isometric)  and  Indirect  (parametric)  approaches  to  accept- 
ability evaluation  by  means  of  twenty-item  system  rating  form.* 
Ten  of  the  items  on  the  form  are  concerned  with  the  accept- 
ability-related perceptual  qualities  of  the  speech  signal , 
itself.  Seven  items  are  concerned  with  the  perceptual  qualities 
of  the  backgroxmd . Three  items  are  concerned  with  the  perceived 
intelligibility . pleasantness . and  overall  acceptability  of  the 
total  effect.  The  descriptors  used  to  define  the  various  nercep- 
tual  qualities  are  the  end  products  of  an  extensive  program  of 
research  concerned  with  the  nature  of  these  qualities  and  with 
the  development  of  a precise  vocabulary  for  characterizing  them. 

The  results  of  further  research  have  indicated  that 
listener's  perceptions  of  modem  digital  voice  communication 
systems  and  diverse  forms  of  laboratory  degradation  can  be 
exhaustively  characterized  in  terms  of  six  elementary  perceived 


* The  isometric  approach  requires  the  listener  to  provide  a 
direct  subjective  assessment  of  the  acceptability  of  a sample 
speech  transmission.  The  parametric  approach  requires  the 
listener  to  evaluate  the  sample  transmission  with  respect  to 
various  perceived  characteristics  or  qualities  (e.g.,  noisiness) 
independently  of  his  Individual  effective  reactions  to  these 
qualities.  Hence,  the  parametric  approach  tends  to  minimize 
the  sampling  error  associated  with  individual  differences  in 
"taste.'*  The  Individual  who  does  not  personally  place  a high 
valuation  on  a particular  speech  Quality  may  «^vertheles8 
provide  Information  of  use  in  predicting  tht  vplcal  indi- 
vidual's acceptance  of  speech  characterized  by  a given  degree 
of  that  perceptual  quality. 
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qualities  of  the  signal  and  three  perceived  qualities  of  the 
background.  Measures  of  these  elementary  qualities  are 
obtained  by  various  combinations  of  rating  scale  data. 

In  accordance  with  the  above  research  results,  DAM 
rating  data  are  presently  analyzed  to  yield  system  diagnoses 
with  respect  to  the  nine  perceptual  qualities  indicated  in 
Table  1.  The  contribution  of  each  of  these  qualities  to  the 
listener's  acceptance  reaction  has  been  determined,  so  that  each 
diagnostic  score  can  be  expressed  in  terms  of  the  level  of 
acceptability  a system  would  be  accorded  if  it  were  deficient 
with  respect  only  to  the  single  perceptual  quality  involved. 
Expressed  in  this  way,  the  pattern  of  diagnostic  scores  reflects 
the  relative  contribution  of  each  perceptual  quality  to  the 
acceptability  of  the  system,  and  permits  the  system  developer 
to  concentrate  on  the  perceived  characteristics  of  his  system 
which  are  most  detrimental  to  its  acceptance. 

The  application  of  multiple,  nonlinear  regression  tech- 
niques to  a set  of  diagnostic  scores  permits  the  derivation  of 
supplementary,  parametric  estimates  of  intelligibility , pleasant- 
ness , and  acceptability,  which  can  be  combined  with  direct,  or 
isometric  rating  data  to  yield  highly  reliable  and  valid  estimates 
of  all  three  of  these  properties.  For  practical  purposes  of 
system  evaluation,  however,  parametric  predictions  are  presently 
provided  only  for  acceptability. 

To  permit  comparisons  with  the  results  of  tests  pre- 
viously conducted  with  FARM,  DAM  acceptability  results  are  trans- 
formed to  their  FARM  equivalents.  A transformation  of  Judged 
intelligibility  results  permits  estimates  of  equivalent  DRT  total 
scores . 
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Rigorous  procedures  for  monitoring  and  screening  of 
listening  crew  members  contribute  significantly  to  the  reli- 
ability of  DAM  results. 


TABLE  I.  SYSTEM  CHARACTERISTICS  EVALUATED  BY  DAM 


SIGNAL  QUALITIES 

Diagnostic 

Typical 

Intrinsic  Effect 

Scale 

Descriptor 

Exemplar 

On  Acceptability 

SF 

Fluttering 

Interrupted  or  Ampli- 

Moderate 

tude  Modulated 

Speech 

SH 

Thin 

High  Pass  Speech 

Mild 

SD 

Rasping 

Peak  Clipped  Speech 

Severe 

SL 

Muffled 

Low  Pass  Speech 

Mild 

SI 

Interrupted 

Packetized  Speech 

Moderate 

with  "Glitches” 

Moderate 

SN 

Nasal 

2.4K  bps  Systems 

BACKGROUND  QUALITIES 

Diagnostic 

Typical 

Intrinsic  Effect 

Scale 

Descriptor 

Exemplar 

On  Acceptability 

BN 

Hissing 

Noise  Masked  Speech 

Moderate 

BB 

Buzzing 

Tandemmed  Digital 

Moderate 

Systems 

Severe 

BF 

Babbling 

Narrow  Band  Systems 

with  Errors 

BR* 

Echoic 

Multipath  Transmission 

? 

TOTAL  EFFECT 

Scale 

Intelligibility 

Pleasantness 

Acceptability 

* Tentative  scale,  still  under  Investigation. 
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Speaker  differences  are  relatively  small  with  DAM, 
particularly  within  sexes.  Depending  on  the  purposes  of  the 
investigator,  however,  the  use  of  more  then  one  speaker  may 
be  appropriate. 

The  speech  materials  used  for  purposes  of  DAM  evalua- 
tions consist  of  12  phonetnically  controlled  sentences,  spoken 
by  each  of  the  desired  number  of  speakers.  Approximately  one 
minute  total  running  time  is  required  for  each  speaker. 

Figure  1 shows  the  standard  format  in  which  DAM  results 
are  reported.  Presented  first  are  the  basic  diagnostic  scores 
and  their  standard  errors.  Each  diagnostic  score  represents 
one  estimate  of  the  acceptability  rating  the  system  being  eval- 
\.xated  would  receive  if  it  were  deficient  only  with  respect  to 
the  corresponding  perceptual  quality.  Summary  scores,  repre- 
senting the  combined  effects  of  signal  qualities  and  background 
qualities,  respectively  are  also  shown.  Gross  scores  relating 
to  acceptability,  judged  pleasantness  and  judged  intelligibility 
are  shown  in  the  bottom  half  of  the  figure. 

Isometric  scores  are  based  only  on  direct  ratings  of 
the  respective  characteristics. 

Pprametric  scores  are  based  on  predictions  of  accept- 
ability from  combined  diagnostic  scores  for  signal  quality  and 
combined  diagnostic  scores  for  background  quality. 

Composite  scores  for  acceptability  are  based  on 
isometric  scores  for  acceptability,  parametric  scores  for 
acceptability,  and  on  composite  ratings  of  pleasantness  and 
intelligibility. 
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Equivalent  FARM  scores  and  Equivalent  DRT  scores  are 
currently  obtained  by  simple  linear  regression  techniques 
applied  to  composite  acceptabiMty  scores  and  isometric  intel- 
ligibility ratings,  respectively . However,  it  is  expected  that 
more  precise  estimates  of  DRT  scores  will  be  obtained  in  the 
future  through  the  application  of  multiple  prediction  techniques 
to  the  DAM  diagnostic  scores.  Fig.  2 shows  che  correlation 
between  DAM  acceptability  scores  (composite)  and  paRM  test 
results  for  a.  sample  of  modern  digital  voice  communication 
aysteras.  Fig.  3 shows  the  correlation  between  isometric  intel- 
ligibliS^  ratings  and  DRT  total  scores. 

DAM  evaluations  have  been  performed  on  an  extremely 
broad  sample  of  state-of-the-art  narrow  band  and  broad  band 
digital  voice  communication  systems.  Norms  for  various  condi- 
tions of  speech/noise  ratio  band  restriction,  and  other  simple 
forms  of  signal  degradation  have  also  been  established.  These 
normative  data  provide  Dynastat  with  truly  unique  capability 
for  detailed,  useful  interpretation  of  DAM  for  future  experi- 
mental systems  or  conditions.  Research,  contemplated  and  in 
progress,  will  serve  to  expand  DAM*s  range  of  application  and 
provide  norms  for  yet  to-be -encountered  processing  techniques 
and  transmission  conditions. 

For  further  information  regarding  the  technical  aspects 
of  the  DA.M  and  on  the  evaluation  services  Dynasuat  offers  with 
it  please  contact; 


Dr.  William  D.  Voters 
Dynastat,  Inc. 

2704  Rio  Grande,  Suite  4 
Austin,  Texas  78705 

Phone:  (512)  476-4797 
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Administrative  or  contractual  information  relating 
to  Dynastat's  services  v?ith  the  DAM  may  be  obtained  from 
Mr.  Ira  L.  Panzer  at  the  same  address  and  phone  number. 
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APPENDIX  B 

DERIVATION  OF  THE  PROBABILITY  DENSITY  FUNCTION 
FOR  raE  STUDENTIZED  RANGE  STATISTIC 
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APPENDIX  B 


DERIVATION  OF  THE  PROBABILITY  DENSITY  FUNCTION 
FOR  THE  STUDEMTIZED  RANGE  STATISTIC 


From  Figure  3.1,  let 


X 


max 


a 


^min 


0 


(B.l) 


Then 


Z • 0 - 0 


(B.2) 


and 

• I ^ 

as  shown  in  Papoulis  (B.l  ). 

Now  the  correlative  distribution  of  o and  0 is 

F ^ ® ^ 

a,  p 

- p({x^  s X i}n(K^  s y for  at  least  one  )) 

- P({x^  s X i)n{x^  i y i}*^) 

- f1J(x)  - ^ ^ ^ 

0 « ^ y 
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Then  the  joint  probability  density  of  a and  3 is 
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A minloonputer-based  Digital  Signal  Processing  Laboratory  has 
been  under  construction  at  Borgia  Tech  since  August  1973.  It  is  now 
an  extensive  hardware-software  coiqplex  dedicated  to  research  and 
instruction  in  many  digital  signal  processing  and  miniconputer  related 
areas.  This  appendix  describes  briefly  the  elements  of  this  system. 

The  system  is  based  upon  three  minicomputers,  an  Eclipse  5230 
with  64K  of  16-bit  memL>.:-y,  and  a NOVA  830  with  64K  of  16-bit  memory  in 
the  Peseeurch  Lab,  and  a NOVA  820  with  32K  of  16-bit  memory  in  the 
Student  Lab.  The  uses  of  these  coB{»uter8  are  numerous  and  diverse. 
Hence,  the  various  hardware  and  software  components  of  the  system  will 
be  presented  separately. 

THE  RESEARCM  COMPUTERS 

A block  diagram  of  the  basic  research  computer  facility  is 
shown  in  Figure  1.  Included  in  this  section  are  only  those  peripherals 
which  are  used  by  many  applications.  A full  set  of  peripherals  are 
listed  in  Table  1. 

The  conputational  power  for  the  system  is  supplied  by  two 
groups  of  the  Eclipse  5230,  which  has  64K  of  16-bit  semiconductor 
memory  (-f  CACHE) , a floating-point  processor,  hardware  multiply 
divided,  a memory  management  unit,  and  writable  control  storage 
(for  microprogramming  the  processor) , and  by  one  ground  of  the  NOVA 
830,  which  has  a floating-point  processor,  hardware  multiply-divide, 
a Btemory  management  unit,  and  64K  of  1 psec  16-bit  memory.  Bulk 
storage  is  supplied  by  three  discs.  The  main  disc  is  a 192  M Byte 
moving  head  drive  shared  gy  the  Eclipse  and  tho  NOVA  830.  Each  of  the 
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FIGURE  1 

Tha  Basic  Syatam  for  tha  Rasaarch  Laboratory 


TABLE  1 


I/O  DEVICES  ON  THE  NOVA  830  I/O  BOSS 


DATA  GENERAL  INTERFACES 

Diablo  33  disc  controller  (S  M bytes) 
Diablo  44  disc  controller  (10  M bytes) 
NOVA  cassette  controller 
Reel  tine  cloc)c 

Floating-point  arithmetic  unit 

Memory  management 

Data  General  mag  tape  controller 

BS-232  interface  at  9600  baud 

RS-232  interface  at  1200  baud 

Inter-processor  buss 

Comtal  video  system  interface 


INTERFACES  CONSTRUCTED  AT  GEORGIA  TECH 

Programmable  sampling  clock 
RS-232  variable  baud  clock 
Joy  stick  interface 
Light  pen  interface 
Button  box  interface 
RS-232  interface  (2) 

16  Lit  double  buffered  D-to-A 

10  bit  single  buffered  D~to-A  (4) 

A-to-D/saiQile  and  hold/analog  stultiplexer 

Ampex  analog  tape  deck  control 

Revox  analog  tape  deck  control 

Crown  analog  tape  deck  control 

Kennedy  7-track  digital  tape  interface 

Line  printer  interface 

Card  reader  interface 

Paper  tape  reader  Interface 

Programmable  stack  (256  words) 

Quality  test  interface 
Universal  card  tester  interface 
Time-of-day  and  data  clock 
Control  card  testing  interface 
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other  two  disc  units  is  of  the  noving  head  type,  and  each  has  one 
fixed  and  one  renoveOsle  pack.  The  Diablo  model  44  disc  has  10  M byte 
capacity,  and  is  used  by  the  Eclipse  alone.  The  Diablo  model  33  has 
5 M byte  capacity,  and  is  shared  by  the  NOVA  830  amd  the  NOVA  820 
(instructional)  computers.  Additional  bulk  storage  is  supplied  by 
two  tape  units,  a NOVA  cassette  tape  and  a 7-track  digital  unit  (a 
9-track  unit  is  on  order  from  Data  General) . The  cassette  is  standard 
Data  General  peripheral,  while  the  7-track  was  interfaced  at  Georgia 
Tech. 

Additional  general  purpose  devices  include  a card  reader,  a 
line  printer,  a paper  tape  reader,  and  a paper  tape  punch.  These 
units  were  all  interfaced  at  Georgia  Tech. 

The  foreground  of  the  NOVA  830  is  used  a a general  peripheral 
control  ground  for  sharing  the  scarce  peripherals.  Most  all  of  the 
general  purpose  and  special  purpose  peripherals  in  the  system  are 
interfaced  to  the  NOVA  830  (see  Table  1) , and  this  ground  accesses 
all  the  other  grounds  on  the  other  coi^puters  in  the  system  to  access 
these  peripherals. 

THE  GRAPHICS  SUBSYSTEM 

One  of  the  major  design  criteria  for  this  system  was  a high 
level  of  high  speed  graphical  interaction  between  the  user  and  the 
computer.  Figure  2 shows  the  hardware  associated  with  the  graphical 
subsystem. 

This  system  supports  many  types  of  graphical  interaction. 

First,  it  supports  line  printer  graphics  both  in  the  axis-graph  mode 
and  in  the  X-Y-Z  mode  for  picture  reproduction.  Second,  the  Tektronix 
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4010  grtphlcal  unit  giv«a  atora^a  typa  vaotor  graphica  at  9600  baud 
and  croaa  hair  feadback  interaction.  Third,  rafraah  graphica  in 
supplied  by  driving  X-Y-S  CRT* a diraetly  fruai  3 of  the  D-to-A'a.  A 
light  pen  (built  at  Georgia  Tech) , along  with  two  joy  sticka,  3 button 
boxes,  and  two  potantiaoMtere  give  interaction  in  the  refresh  eoda. 
fourth,  a CALCONP  incraiMntal  plotter  (interfaced  at  Georgia  Te^) , 
gives  hard  copy  capability  in  the  vector  and  character  modec.  Last, 
a Coetal  video  processor  gi'.’es  X-Y-z  CRT  support  on  -v  512x512  display 
with  eight  bits  resolution. 

THE  AUDIO  SOBSYSmt 

A diagram  of  the  audio  subsystem  is  given  in  figure  3.  This 
subsystem  was  constructed  as  an  aid  to  interactive  speech  processing. 

The  whole  system  is  centered  on  a pat(^  bay  located  with  the 
NOVA  630,  This  patch  bay  gives  the  user  great  flexibility  in  Interccn- 
nectlng  the  individual  audio  coaf>onents. 

Data  acquisition  is  handled  throu^  a IZ^bit  (10  yseo)  A<*to-D 
with  an  8-channel  analog  multiplexer  on  its  input.  Data  playback  is 
handled  by  lb-bit  double  buffered  E>-to-A.  The  sampling  rate  on 
these  two  units  is  controlled  by  a prograanable  clock.  Four  additional 
channels  of  8 bit  O-to-A's  form  single  buffered  analog  outputs.  The 
entire  data  acquisition  and  playback  system  was  built  at  Georgia  Tecdi. 

four  analog  tape  drives  are  available  for  use  wltn  the  system. 
TWO  of  these,  a Crown  800  and  a Rcvox  tape  drive,  are  interfaced  so 
they  may  be  controlled  by  the  cosputer.  The  Crown  interface  allows 
the  positioning  of  the  tape  to  any  desired  position  (within  tape 
stretch) . Either  of  the  two  Aapex  drives  may  be  xued  under  computer 
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THe  Audio  Subsystem  on  th«5  NOVA  830 
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control  In  place  of  the  nevox. 

Four  variable  filters  and  three  audio  aiqplifiers  are  also 
available  for  use  with  this  system. 

SPEECH  QUALITy  TEST  SUBSYSTEM 

The  speech  quality  test  subsystem  depicted  in  Figure  4 is 
designed  for  the  automated  control  cf  subjective  quality  tests.  The 
subsystem  consists  of  six  stations,  located  in  a separate  speech  quality 
laboratory  and  controlled  by  the  NOVA  830  computer.  Each  of  the 
stations  has  a CRT,  15  response  buttons,  a "read"  button,  ear  phones, 
and  a volume  control  for  each  ear.  The  computer  interface  can  read  the 
buttons  at  any  station,  clear  and  set  the  ready  flip  flop,  and,  using  a 
software  character  generator,  display  messages  to  the  subjects  on  the 
CRT's. 

This  quality  system  has  several  distinct  advantages  over  a 
non-autoroated  system.  First,  it  eliminates  much  of  the  hand  work  on 
data  reduction,  second,  it  allows  on-line  statistical  analysis.  Last, 
it  allows  the  subjective  test  to  reconfigure  itself  based  on  the  subject 
responses. 

THE  OPTICAL  DATA  PROCESSING  SUBSYSTEM 

A diagram  of  the  optical  data  processing  facility  is  given  in 
Figure  5.  This  subsystem  has  three  cocponents.  The  first  component, 
the  "picture  acquisition"  component,  consists  of  a Micro  NOVA  Micro- 
computer (in  Dr.  William  Rhodes'  laboratory)  wnich  controls  an 
electro-mechanical  scanner.  This  equipment  is  still  under  development. 
Second,  the  Micro  NOVA  also  controls  2m  optical  data  digitizer  for 
picture  acquisition.  The  third  component  in  this  system  is  the 


135 


BACH  STATION:  CRT 

15  BUTTONS 
wady  button 

<:aR  phones  a volume  control 


PIOURB  4 

Th«  Speech  Quelicy  Testing  Subiyitem 


136 


"picture  playback"  Caclllty.  This  facility  consists  of  3 D-to-A's 
•nd  two  CRT  scopes.  One  CRT  is  of  the  storage  type,  and  allows  quick 
viewing  of  the  pictures  being  displayed.  The  second  CRT  is  equipped 
with  a scope  camera.  The  interchangeable  backs  on  this  camera  allow 
the  production  of  either  Polaroid  or  120  roll  film  pictures.  The 
Comtal  video  system  can  also  be  used  to  produce  pictures. 

TOE  COMPUTER  NETWORK  SUBSYSTEM 

A "star"  computer  network  is  currently  under  development  in  the 
digital  signal  processing  laboratory.  The  basic  hardware  for  this 
system  is  shown  in  Figure  6.  The  NOVA  830  connunicates  with  the 
Eclipse  through  an  interprocessor  buss  (IPD),  and  with  several  other 
computers  through  high  speed,  variable  baud  rate,  RS-232  standard, 
asvnchronous,  serial  interfaces.  These  RS-232  interfaces  were  designed 
and  built  at  Georgia  Tech,  and  are  capable  of  spends  up  to  152K  baud. 

The  hardware  for  this  system  exists  and  is  tested.  The  software 
is  currently  under  development. 

THE  UNIVERSAL  CARD  TESTER  AND  THE  HARDWARE  PHIU?S0PHY 

One  of  the  most  important  subsystems  of  the  digital  signal  pro- 
cessing laboratory  is  the  universal  card  tester.  To  understand  how  this 
is  used,  it  is  important  to  understand  the  hardware  philosophy  of  the 
laboratory.  Most  of  the  hardware  constructed  in  the  laboratory  is 
constructed  in  prebuilt  chassis.  Each  chassis  contains  40  56-pin 
connectors.  The  computer  1/0  buss  enters  each  chassis  and  is  split 
into  3 sub-busses,  called  the  "data  buss,"  the  "control  buss,"  and  the 
"address  buss.”  If  this  is  not  the  final  chassis  on  the  daisy  chain, 
the  busses  are  regrouped,  and  extended  to  the  next  chassis. 


KHaasBSiaB 


CARD  TESTER 


SWITCH  PANEL 


I 


PATCH  PANEL 


FIGURE  7 

The  Unlveroel  Card  Tea  car 


140 


! J 

The  hardware  Interfaces  constructed  In  the  chassis  axe  sostly 
constructed  from  pre-designed  printed  circuit  boards.  A list  of  the 
PC  cards  available  for  interface  construction  is  given  in  Table  2. 

Host  interfaces  consist  of  using  some  set  of  "standard**  cards  with, 
perhaps,  some  additional  construction. 

The  smin  problem  in  hardware  construction,  therefore,  is  in 
building  and  testing  the  '*standard'*  cards,  often  with  semi-sAilled 
labor.  This  is  the  purpose  of  the  universal  card  tester. 

A diagram  of  the  universal  card  tester  is  given  in  Figure  7. 

The  tester  has  a switch  panel,  a patch  panel,  and  a single  **standard** 

S6-pin  connector  as  an  "input,"  and  "output,"  or  as  an  "external." 

Each  pin  has  a parallel  connection  to  the  patch  panel  for  external 
connection.  The  computer  can  read  or  write  individual  bits  to  any  pin 
position.  Hence,  any  dif sired  input/output  sequence  can  be  presented 
to  a card  being  tested,  and  the  results  can  be  read  back  by  the 
computer. 

The  software  package  associated  with  the  card  tester  allows  the 
user  to  test  and  debug  any  of  the  standard  cards.  In  addition,  a 
special  card  allows  the  testing  of  individual  integrated  circuit  chips. 

THE  BASIC  INSTRUCTIONAL  COMPUTER  (NOVA  820) 

The  NOVA  820  computer  and  its  associated  peripherals  forms  a 
computer  and  signal  processing  facility  dedicated  to  student  activities.  J 

These  activities  mainly  include  several  laboratories  associated  with  j 

course  and  student  project  work.  The  hardware  is  configured  so  as  to 

! 

allow  maximum  utilisation  of  the  software  developed  in  the  research 
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laboratory . 


TABLE  2 


STANDABD  PC  CARDS  USED  IN  THE  HODULAR  CONSTRUCTION  SYSTEM 


CARD  NAME 

PURPOSE 

f 

Single  Address 

Address  decode 

Dual  Address 

Address  decode 

Control 

Interrupt  control 

e 

Input  buffer 

16  bit  Input  buffer 

Output  buffer 

16  bit  output  buffer 

i 

DMA 

Direct  memory  access  control 

Counter 

16  bit  up/down  counter 

Memory 

256x256  bit  high  speed 

memory  (43  msec) 

BS-232  (1) 

High  speed  serial  converter 

; 

BS-232  (2) 

Medium  speed  serial  converter 

M60OO  CPU 

Micro-processor  CPU 

T 

"M60OO  Memory  (1) 

Micro-processor  memory  {4K  Ram) 

' 

)^0OO  Memory  (2) 

Micro-processor  memory  (4K  RAM, 

4K  ERROR) 

M60OO  Buffer 

Micro-processor  buffer 

M60OO  Control 

Micro-processor  interrupt 
control 

Kluge 

General  purpose 
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Figure  8 shows  the  basic  NOVA  820  c<»qputer  system  and  Table  3 
gives  a list  of  peripherals.  The  CPU  haa  32K  of  800  nsec  memory  and  a 
hardware  multiply-divide  unit.  Bulk  storage  la  formed  by  two  moving- 
head  disc  drives  totaling  5 M bytes  of  storage.  These  discs  are  shared 
with  the  NOVA  830,  and  communication  bet%raen  the  processors  is 
maintained  on  a high  speed  RS-232  port. 

Many  of  the  peripherals  have  been  constructed  so  as  to  be 
identical,  from  a computer  command  viewpoint,  to  those  on  . .search 
facility.  Hence,  the  D-to-A's,  the  double  buffered  D-to  A's,  the 
A-to-D,  the  A-to-D  8-chemnel  analog  multiplexer,  and  the  prograimuble 
clock  all  utilize  the  same  commands  as  their  counterparts  on  the  NOVA 
630.  These  peripherals  give  the  NOVA  820  a similar  audio  and  refresh 
graphics  capability  to  the  NOVA  830. 

Interactive  sn>4phlcs  on  the  NOVA  820  is  handled  Ly  a M6800 
control  plasma  terminal  designed  to  look  like  a Tektronixs  4010. 

Hence,  all  the  graphics  packages  developed  for  the  NOVA  830  will  run 
on  the  NOVA  820. 

THE  MICHO-COMPUTER  SUBSySTEM  (M6800) 

One  of  the  most  important  developments  in  modem  control  tech- 
nology has  been  the  development  of  the  micro-processor.  The  micro- 
processor subsystem  of  the  student  (NOVA  820)  laboratory  was  developed 
with  three  purposes! 

1.  To  develop  a micro-processor  board  set  for  use  as 
a general  interfacing  tool. 

2.  To  develop  a hardware  interface  between  NOVA  820 
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TABLE  3 


I/O  DEVICES  OK  THE  NOVA  820  I/O  BUSS 


DATA  GEHEBAL  INTERFACES 


Diablo  33  disc  controller 
RS-232  Interface  at  1200  baud 
Inter-proceasor  buss 


IWrEBTACES  CONSTRUCTED  AT  GEOBGIA  ITCH 

Progranaable  sanpling  clock 

Light  pen  interface 

16  bit  double  buffered  D>to-A 

10  bit  single  buffered  D-to-A  (4) 

A-to-O/saaple  and  hold/analog  nultiplexer 

Line  printer/N6800  input  interface 

N6800  Nicro-coinputer  CPU 

H6800  4K  taeaory  nodule  (2) 

146800  control  and  cosmunication  interface 
Plaana  display  interface 
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and  a micro~proc«ss3r  and  to  develop  software  for 
the  NOVA  620  which  >llaw  alnple,  interactive 
software  development  for  the  microprocessor. 

2.  To  develop  software  for  thn  skicro-processor  to  do 
the  graphics  and  character  generation  tasks  related 
to  the  plasma  scope. 

All  three  of  these  purposes  have  been  accon^lished.  Future 
goals  for  the  subsystem  Include  the  addition  of  another  8 bit  micro- 
processor board  *8080A)  and  the  develofment  of  a system  based  on  the 
new  Data  General  16  bit  micro-processor. 

A diagram  of  the  hardware  associated  with  the  micro-processor 
is  shown  in  Figure  9.  Ihrough  a general  interface  to  the  micro- 
processor's buss,  the  NOVA  820  can  completely  control  the  micro- 
processor and  load  and  examine  the  micro-processor  memory.  Through  a 
3tandard  interrupt  interface,  the  NOVA  820  can  comnunicate  with  the 
micro-processor  as  it  would  any  other  peripheral.  This  environment 
allows  great  flexibility  in  the  use  of  the  micro-processor. 

The  micro-processor  itself  has  8K  of  8 bit,  1 msec  memory,  an 
interrupt  I/O  port,  and  a 16  bit  I/O  buffer.  E:q>an8ion  of  the  hardware 
and  software  for  this  subsystem  is  continuing. 
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APPENDIX  D 


SOFTHUtE  SIDtfARY 


PROGRAM  NAME 

LANGUAGE- 

CATEGORY: 


ACUNP 

FORT 

GENERAL 


SWITCH 


TYF'E  PURPOSE 


I 

R 

0 


G INPUT  STARTING  ADDRESS  FROM  TTY 

G DATA  IS  REAL (ASSUME  INTEGER  OTHERWISE) 

L OUTPUT  (CONTIGUOUS)  FILE  — MUST  COME  FIRST 


PURPOSE 

TO  CONCATENATE  A SET  OF  CONTIGIOUS  FILES  INTO  A SINGLE  OUTPY 


PROGRAM  NAME. 

ACONTS 

LANGUAGE; 

FORT 

CATEGORY: 

GENERAL. 

SWITCH 

TYPE 

PURPOSE 

R 

G 

DATA  IS  REAL — ASSUMED  INTEGER  OTHERWISE 

0 

L 

CONTIGIOUS  OUTPUT  FILE 

PURPOSE 

TO  CONCATENATE  A SET  OF  CONTIGIOUS  INPUT  FILES  OF  INTEGRAL  N 
OF  CYLINDERS  INTO  A SINGLE  OUTPUT  FILE 


PROGRAM  NAME:  ADPCM 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 


P 

1 

0 

C 

X 

E 

M 

D 

L 


I..  PITCH  FILE 

L INPUT  FILE  (SPEECH) 

L OUTPUT  FILE  (SPEECH) 

L FEEDBACK  COEFFIENT  FILE 

L QUANTIZED  ERROR  OUTPUT  FILE 

L ERROR  OUTPUT  FILE 

L MULTIPLIER  OUTPUT  FILE 

L DATA  FILE 

L LISTING  FILE 
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PURPOSE 

TO  SIMULATE  GENERAL  AOPCM  SYSTEMS.  SYSTEM  IS  CONFIGURED  BY  D 
AND  INPiJT/OUTPUT  FILESCEO.  IF  A /P  FILE  IS  PRESENT.  A PITCH  8 
ERROR  CORRECTION  IS  DONE) 


PROGRAM  NAME;  CP ITCH 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 

O L OUTPUT  PITCH  FILE 


PURPOSE 

TO  CREATE  A CONSTANT  PITCH  CONTOUR. 


PROGRAM  NAME:  DECK 

LANGUAGE;  FORT 

CATEGORY:  GENERAL 


SWITCH  TYPE  PURPOSE 

P 0 PLAY 

R 0 RECORD 

F G FAST  FORWARD 

S 0 FAST  BACKWARD 

C 0 USE  CROWN  ir.STEAD  OF  AMPEX 


PURPOSE 

ANALOGUE  TAPE  DRIVE  CONTROL  PROGRAM. 


PROGRAM  NAME: 

LANGUAGE: 

DATE; 

AUTHOR : 
CATEGORY: 


DCAADIN 
FORTRAN 
6/  9/77 
T. P.  BARNWELL 
GENERAL 


PURPOSE 

THIS  18  AN  INTERACTIVE  PROGRAM  FOR  TRAN8FERIN0  DATA  FROM  IBM 
SPEECH  DATA  TAPES.  OROINATING  AT  DCA.  TO  DATA  GENERAL  CONTIQ 


ISO 


FILES 

THE  PROGRAM  IS  INTERACTIVE  AND  SELF  EXPLANITORY 


PROGRAM  NAME: 

LANGUAGE; 

DATE: 

AUTHOR : 
CATEGORY: 


DCAAV 
FORTRAN 
6/  9/77 
T.  P.  BARNUELL 
GENERAL 


PURPOSE 

•’•HIS  PROGRAM  COMPUTES  THE  AVERAGE  OF  MANY  OBJECTIVE 
MEASURES  COMPUTED  BY  OBJETIVE  AMI)  nCJ2.  ITS  PURPOSE  IS  TO 
GET  AN  OVERALL  MEASURE  FROM  MANY  SINGLE  WINDOWED  ERRORS 


PROGRAM  NAME: 

LANGUAGE: 

DATE: 

AUTHOR; 

CATEGORY: 


DCATAPEIN 
FORTRAN 
&/  't/ll 
T.  P.  BARNWELL 
GENERAL 


PURPOSE 

THIS  13  AN  INTERACTIVE  PROGRAM  TO  TRANSFER  AN  IBM  9 TRACK 
TAPE  CODED  IN  EBCDIC  TO  AN  ASCII  FILE  ON  RDOS  FILE  STRUCTURE 


PROGRAM  NAME: 

LANGUAGE; 

CATEGORY: 


DATAMAKE 

FORT 

GENERAL 


SWITCH  TYPE  PURPOSE 


I 

0 

D 


L INPUT  INSTRUCTION  FILE 

L OUTPUT  INSTRUCTION  FILE 

L DATA  FILE 


PURPOSE 

TO  MAKE  A NEW  DAT  e-lLE  FOR  THE  8YSTEMTIC  TESTING  OF 
ANY  SYSTEM. 
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PROORAM  NAME; 

LANGUAGE: 

CATEGORY. 


DATASTART 

FORT 

GENERAL 


PURPOSE 

INTERACTIVE  PROGRAM  FOR  CREATING  CONTROL  FILE  FOR  DATAMAKE. 


PROORAM  NAME: 

LANGUAGE: 

CATEGORY: 

DFDP 

FORT 

GENERAL 

SWITCH  TYPE 

PURPOSE 

I R A 

0 R A 

M R A 

P R A 

INPUT  DATA  FILE  (OPTIONAL) 
OUTPUT  FILTER  COEFFICIENTS 
MAGNITUDE  SPECTRUM  (OPTIONAL 
PHASE  SPECTRUM  (OPTIONAL) 

PURPOSE 

DESIGNS  DIGITAL 

FILTERS 

PROORAM  NAME: 

LANGUAGE; 

CATEGORY; 

DFD 

FORT 

GENERAL 

SWITCH  TYPE 

PURPOSE 

I R A 

0 R A 

M R A 

P R A 

INPUT  DATA  FILE  (OPTIONAL) 
OUTPUT  FILTER  COEFFICIENTS 
MAGNITUDE  SPECTRUM  (OPTIONAL 
PHASE  SPECTRUM  (OPTIONAL) 

PURPOSE 

DESIGNS  DIGITAL 

FILTERS 

PROGRAM  NAME:  DOWN 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


i52 


4 


LO«e«  ORDER  BITS,  *ND/aR  DROR  E^T  OTHER  OR  B OUT  0 

3P BITS  TO  REDUCE  SAMPLING  FREQUENT. S . 


PROGRAM  NAME; 

LANGUAGE; 

CATEGORY: 


FILTER 

FORT 

SPEECH 


SWITCH  TYPE  PURPOSE 


INPUT  FILE 
RESULT  FILE 

data  file 


SeSeRRL  CANOICAL  form  DIOITAL  FILTER  PRORRAM. 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY: 


FNORM 

FORT 

SPEECH 


SWITCH  TYPE  PURPOSE 


INPUT  FILE 
RESULT  FILE 
DATA  FILE 


PURPOSE 

TO  NOMALIZE  A FLOATING  POINT  FILE. 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY; 


FFILTER 

FORT 

SPEECH 


SWITCH  TYPE  PURPOSE 

1 L INPUT  FILE 

L RESULT 

D L DATA  FILE 


PURPOSE 

POROROUND  VERSION  OF  FILTER. 


PROGRAM  NAME;  FILMPY 

LANGUAGE.  FORT 

CATEGORY:  GENERAL 


SWITCH  TYPE  PURPOSE 


0 

M 

P 


R A OUTPUT  FILTER  COEFF 

R A MAGNITUDE  SPECTRUM  (OPTIONAL) 

R A PHASE  SPECTRUM  (OPTIONAL) 


PURPOSE 

PUTS  TOGETHER  ANY  NUMBER  OF  DIGITAL  FILTERS  TO  MAKE 

ONE  FILTER  (CASCADE).  INPUT  FILTER  FILES  HAVE  NO  SUITHCE8. 


PROGRAM  NAME:  FILPLT 

LANGUAGE:  FORT 

CATEGORY:  GENERAL 


PURPOSE 

F-SWAP  PROGRAM  FOR  DFDP 


PROGRAM  NAME:  0000 

LANGUAGE;  FORT 

CATEGORY:  SPEECH 


PURPOSE 

TO  INITIALIZE  THE  CLOCK  AND  A/D  CHANNEL. 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY: 


HEAR 

ASM 

SPEECH 
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f 

r 

k 

I 


f 


i 

t 

SWITCH  TYPE 

PURPOSE 

1 

1 

* 1. 

SEE  * 

3 

J 

PURPOSE 

' T 

TO  SAMPLE  INPUT 

ANALOGUE  DATA 

t 

t 

* SWITCH  DETEHMINES  SIZE  OP  SAMPLE  INCYLINDERS 

\ 

A=l,  ETC 

\ % 

PROGRAM  NAME 

HLPD 

■» 

LANGUAGE 

FORT 

k 

CATEGORY; 

SPEECH 

1 1 

SWITCH  TYPE 

PURPOSE 

.. 

I L 

INPUT  SPEECH  DATA 

P L 

OUTPUT  PITCH  DATA 

- 

i ■■ 

D L 

data  file 

li. 

1 » 

L L 

LISTING  FILE 

1 

i 

PURPOSE 

- 

1 ' 

1 

1 

1 

i 

HARD  LIMITED  AUTOCORRELATION  PITCH  DETECTOR 

- 

1 

PROGRAM  NAME; 

HIRE 

- 

LANGUAGE: 

FORT 

CATEGORY: 

SPEECH 

i 

SWITCH  TYPE 

PURPOSE 

- 

i 

I I A 

INTEGER  SPEECH  INPUT  FILE 

0 1 A 

INTEGER  IMPULSE  RESPONSE  OUTPUT 

1 ' 

P R A 

DATA  FILE  < OPTIONAL) 

I 

r 

L L 

LISTING  (OPTIONAL) 

r 

» 

PURPOSE 

1 • 

1 

t - 

HOMOMORPHIC  IMPULSE  RESPONSE  EXTRACTOR. 

L 

PROGRAM  NAME: 
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LPC 

O K Q -I 


LANGUAGE: 

CATEGORY: 


FORT 

SPEECH 


SWITCH  TYPE  PURPOSE 


I 


L INPUT  SPEECH  FILE 

L COEF  FILE 

L PARCOR  COEF.  FILE 

L AUTO,  file 

L DATA  FILE 

L LISTING  FILE 


PURPOSE 

BASIC  BLOCK  SYNCHRONOUS  AUTOCORRELATION/TOEPLITZ  VOCODER  TRA 


PROGRAM  NAME: 
LANGUAGE : 
DATE: 

AUTHOR : 
CATEGORY. 


LPR 

FORTRAN 
6/  9/77 
T.  P.  BARNWELL 
SPEECH 


SWITCH  TYPE  PURPOSE 


A 

K 

C 

0 

D 

R 

P 

L 

X 


LOCAL  AREA  FUNCTIONS 
LOCAL  PARCOR  COEFFICIENTS 
LOCAL  FEEDBACK  COEFFICIENTS 
LOCAL  FEEDBAK  COEFFICIENTS 
LOCAL  BATCH  <DATA)  CONTROL  FILE 
LOCAL  AUTOCORRELATION  COEFFICIENTS 
LOCAL  PITCH  FILE 
LOCAL  LISTING  FILE 
LOCAL  EXCITATION  OUTPUT  FILE 


PURPOSE 

THIS  IS  A GENERAL  PURPOSE  LPC  RECEIVER  PROGRAM.  IT  RECONFIGU 
ITSELF  DEPENDING  ON  WHAT  FILES  APPEAR  IN  ITS  INPUT  COMMAND 
LINE.  IF  ITS  "X"  LINES  ARE  COMPILIED/  THE  PROGRAM  CAN  ADD 
SEVERAL  DISTORTIONS  TO  THE  OUTPUT  SPEECH.  INCLUDING  UNIFORM 
BANDWIDTH  DISTORTION  AND  UNIFORM  FREQUENCY  DISTORTION.  IT  MA 
THUS  BE  USED  TO  CORRECT  HELIUM  SPEECH  OR  INSTALL  CONTROLLED 
DISTORTIONS  ON  THE  OUTPUT. 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY: 


LOOK 

FORT 

GENERAL 
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SWITCH  TYPE  PURPOSE 

X L data  file 


PURPOSE 
INTERACTIVE 
BASED  ON  UP 


GRAPHICS  INTERPRETER  WHICH  ALLOWS  UP  TO  EIGHT 
TO  4 FILES  ON  THE  4010  GRAPHICS  TERMINAL. 


PL 


PROGRAM  NAMF.  MBPD 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 


A 

B 

C 

2D 

E 

1 

P 

L 


1 A 
1 A 
I A 
1 A 
I A 
1 A 
R A 
R 


UNFILTERED  SPEECH  INPUT 
50-100HZ  FILTERED  SPEECH 
100-200HZ  FILTERED  SPEECH 
400HZ  FILTERED  SPEECH 
400-BO0H*  rtLTERED  SPEECH 
DATA  FILE  INPUT  (OPTIONAL) 

PITCH  CONTOUR  OUTrV^ 

average  level  input  (FRUn  HBPWR) 


PURPOSE 

MULTI  BAND  PITCH  DETECTOR 


PROGRAM  NAME.  MBPLOT 

LANGUAGE:  POST 

CATEGORY:  SPEECH 


PURPOSE 

"F-SWAP "PROGRAM 


FOR  USE  WITH  MBPD 


PROGRAM  NAME:  MBPWR 

LANGUAGE:  POST 

CATEGORY.  SPEECH 


SWITCH  TYPE  PURPOSE 


1 A UNFILTEREO  SPEECH  INPUT 
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I A 
I A 
1 A 
1 A 
I A 


50--100H2  FILTERED  SPEECH  INPUT 
100-200H2  FILTERED  SPEECH  INPUT 
200-400H2  FILTERED  SPEECH  INPUT 
400-800H2  FILTERED  SPEECH  INPUT 
LEVEL  OUTPUT  FILE 


PURPOSE 

AVERAGE  MAGNITUDE  LEVEL  FOR  MBPD 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY; 


SWITCH  TYPE 


NORM 

FORT 

SPEECH 


PURPOSE 


! 


I 

i 


I 

R 

D 


L INPUT  FILE 

L RESULT  FILE 

L DATA  FILE 


PURPOSE  i 

TO  NORMALIZE  AN  INTEGER  FILE 


j 

1 

C 

PROGRAM  1 

NAME: 

OBJECTIVE 

r 

c 

LANGUAGE 

FORTRAN 

1 

c 

DATE: 

6/  9/77 

1 

i 

c 

AUTHOR: 

T.  P,  BARNWELL 

; { 

h 

c 

CATEGORY 

SPEECH 

- I 

L 1 

: t 

1 

c 

F 

c 

c 

SWITCH 

TYPE 

PURPOSE 

c 

M 

LOCAL 

MASTER  FILE 

i ^ 

t 

c 

S 

LOCAL 

SLAVE  FILE 

f 

f 

c 

0 

LOCAL 

BATCH  <DATA)  FILE 

e 

f 

1 

c 

L 

LOCAL 

LISTING  FILE 

i' 

c 

% 

c 

f 

c 

PURPOSE 

C TO  COMPUTE  THE  GAIN  WEIGHTED  AND  NON  GAIN  WEIGHTED  SPECTRAL 

C DISTANCE  METRIC  BETWEEN  TWO  SPECTRUM  FILES  THE  SPECTRUM 

C FILES  ARE  NORMALLY  GENERATED  BY  LPC  , PCEP.  HIRE,  OR  SPCANA 

C 
C 

C 


I 

C-  i 


mtOMAH  NAHE: 

LAMOUAOC' 

DATE; 

AUTH  R: 
CATEOORV; 


SWITCH 

TYPE 

PURPOSE 

M 

LOCAL 

MASTER  FILE 

S 

LOCAL 

SLAVE  FILE 

D 

LOCAL 

BATCH  (DATA) 

L 

LOCAL 

LISTING  FILE 

PURPOSE 

TO  COnPUTE  THE  OA3N  WEIGHTED  AND  NON  GAIN  WEIGHTED  NON-SPECTRAL 
DISTANCE  METRIC  BETWEEN  TWO  SPECTRUM  FILES  THE  NON-SPECTRUM 
FILES  ARE  NORMALLY  GENERATED  BY  LPC  . PCEP.  HIRE.  OR  SPCANA. 


0BU2 
FORTRAN 
A/  9/77 
T. P. BARNWELL 
SPEECH 


PROGRAM  NAME: 

LANGUAGE: 

DATE: 

AUTHOR: 

CATEGORY: 


PCEP 
FORTRAN 
6/  9/77 
T.  P.  BARNWELL 
SPEECH 


SWITCH  TYPE  PURPOSE 


D 

A 

M 

8 

B 

L 

W 

z 


LOCAL  BATCH  (DATA)  CONTROL  FILE 
LOCAL  OUTPUT  CEPBTRUM  FROM  A 
LOCAL  MASTER  INPUT 
LOCAL  SLAVE  INPUT(B> 

LOCAL  OUTPUT  CEPSTRUM  FROM  B 
LOCAL  LISTING  FILE 

LOCAL  INPUT  (ASCII)  WINDOW  (FIR  FILTER)  FUNCTION 
LOCAL  BINARY  POINT  BY  POINT  METRIC 


PURPOSE 

THIS  IS  A GENERAL  PURP08  CEP8TRAL  COMPARE  PROGRAM.  IT  ALLOWS 
USER  TO  COMPARE  ANY  REGION  OF  THE  0P08ING  CEP8TRUM8  AFTER  AN 
WINDOW  FUNCTION  HAS  BEEN  APPLIED.  THIS  ALLOWS  THE  PROGRAM  TO 
USED  FOR  BOTH  SPECTRAL  ENVELOP  AND  EXCITATION  COMPARISONS. 


PROGRAM  NAME: 

LANGUAGE: 

DATE; 

AUTHOR: 

CATEGORY: 


PDI8T0RT 
FORTRAN 
6/  9/77 
T. P. BARNWELL 
SPEECH 
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PURPOSE 

THIS  PROGRAM  IB  USED  TO  SYSTEMATICAULY  DISTORT  PITCH  CONTOUR 
THE  DISTORTION  IS  A CONSTANT  RISE  OR  FALL  IN  THE  PITCH  PERIO 
THE  DISTORTION  ONLY  0CCURE8  IN  VOICED  SEGMENTS.  AND  THE  PROG 
IS  INTERACTIVE.  ' I J 


PROGRAM 

NAME: 

PTCTC 

LANGUAGE 

FORT  - 

CATEGORY 

SPEECH 

SWITCH 

TYPE 

PURPOSE 

P 

L 

PITCH  FILE 

I 

L 

INPUT  SPEECH 

F 

L 

INPUT  FILTERED  SPEECH 

PURPOSE 

TO  HAND 

PAINT 

A PITCH  CONTOUR  FOR  TESTING. 

PROGRAM  NAME;  PCHECH 

LANGUAGE  FORT 

CATEGORY;  SPEECH 


SWITCH  TYPE  PURPOSE 


M 

T 

0 

A 

O 

L 


L INPUT  STATISTICS  FILE 

L OUTPUT  STATISTICS  FILE 

L DATA  FILE 

L ADO  ON  HI ST I ORAM  IN 

L ADD  ON  HISTOGRAM  OUT 

L LISTING 


PURPOSE 

TO  CHECK  THE  OUTPUT  OF  A PITCH  PERIOD  ESTIMATOR  AGAINST 
A HAN  PAINTER  PITCH  CONTOUR. 


PROGRAM  NAME; 

LANGUAGE; 

CATEGORY: 


PRNT 

FORT 

GENERAL 
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PURPOSE 

TO  PRINT  A PROGRAM  WITH  FILE  NAHE  AND  DATE 


PROGRAM  NAME: 

SCALE 

LANGUAGE; 

FORT 

CATEGORY; 

SPEECH 

SWITCH  TYPE 

PURPOSE 

I L 

INPUT  FILE 

R L 

RESULT  FILE 

PURPOSE 

TO  SCALE  M DATA 

FILE  FOR  FILTER 

PROGRAM  NAME; 

SF 

LANGUAGE: 

FORT 

CATEGORY; 

GENERAL 

SWITCH  TYPE 

PURPOSE 

I L 

INPUT  FILE 

D L 

DATA  FILE 

R L 

RESULT  FILE 

C L 

COEF.  FILE 

PURPOSE 

TIME  VARYING  DIGITAL  FILTER  PROGRAM 

PROGRAM  NAME; 

LANGUAGE: 

DATE; 

AUTHOR; 

CATEGORY; 


SPCANA 
FORTRAN 
6/  9/77 
T P.  BARNWELL 
SPEECH 


SWITCH  TYPE  PURPOSE 

I LOCAL  INPUT  FILE 
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O LOCAL  OUTRrr  SPECTRUH 

D LOCAL  BATCH  (DATA)  CONTROL  FILE 

L LOCAL  LCD  SPECTRUM  OUTPUT  FILE 


PURPOSE 

THIS  IS  A (GENERAL  PURPOSE  SPECTRUM  ANALYSIS  PROORAM  DESIGNED 
TO  DO  CEPSTRUM  OR  LPC  DECONVOLVED  SPECTRUM. 


PROORAM  NAME;  ZCPD 

LANGUAGE:  FORT 

CATEGORY;  SPEECH 


SWITCH  TYPE  PURPOSE 


I 

P 

D 


L INPUT  FILE 

L OUTPUT  PITCH  CONTOUR 

L DATA  FILE  (OPTIONAL 


PURPOSE 

ZERO  CROSSING  PITCH  DETECTOR 
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